Message ID | 20240814123429.20457-3-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Fix a couple of pKVM/nVHE TLB invalidation bugs | expand |
Hi Will, On Wed, 14 Aug 2024 13:34:29 +0100, Will Deacon <will@kernel.org> wrote: > > When the target context passed to enter_vmid_context() matches the > current running context, the function returns early without manipulating > the registers of the stage-2 MMU. This can result in a stale VMID due to > the lack of an ISB instruction in exit_vmid_context() after writing the This refers to the exit_vmid_context() *before* a subsequent enter_vmid_context(), right? > VTTBR when ARM64_WORKAROUND_SPECULATIVE_AT is not enabled. I'm not sure I completely get the failure mode, so please bear with me. > > For example, with pKVM enabled: > > // Initially running in host context > enter_vmid_context(guest); > -> __load_stage2(guest); isb // Writes VTCR & VTTBR > exit_vmid_context(guest); > -> __load_stage2(host); // Restores VTCR & VTTBR > > enter_vmid_context(host); > -> Returns early as we're already in host context Is this the code you are referring to? vcpu = host_ctxt->__hyp_running_vcpu; [...] if (vcpu) { /* * We're in guest context. However, for this to work, this needs * to be called from within __kvm_vcpu_run(), which ensures that * __hyp_running_vcpu is set to the current guest vcpu. */ if (mmu == vcpu->arch.hw_mmu || WARN_ON(mmu != host_s2_mmu)) return; cxt->mmu = vcpu->arch.hw_mmu; } else { /* We're in host context. */ if (mmu == host_s2_mmu) return; cxt->mmu = host_s2_mmu; } So I take it that no vcpu is running at this point, and that we have done a vcpu_put(). Is there an actual path within pKVM that causes a guest TLBI to be followed by a host __kvm_tlb_flush_vmid() *without* a CSE? I can't convinced myself that such a path exist in the current upstream code base. > tlbi vmalls12e1is // !!! Can use the stale VMID as we > // haven't performed context > // synchronisation since restoring > // VTTBR.VMID > > Add an unconditional ISB instruction to exit_vmid_context() after > restoring the VTTBR. This already existed for the > ARM64_WORKAROUND_SPECULATIVE_AT path, so we can simply hoist that onto > the common path. > > Cc: Marc Zyngier <maz@kernel.org> > Cc: Oliver Upton <oliver.upton@linux.dev> > Cc: Fuad Tabba <tabba@google.com> > Fixes: 58f3b0fc3b87 ("KVM: arm64: Support TLB invalidation in guest context") > Signed-off-by: Will Deacon <will@kernel.org> > --- > arch/arm64/kvm/hyp/nvhe/tlb.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c > index ca3c09df8d7c..48da9ca9763f 100644 > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c > @@ -132,10 +132,10 @@ static void exit_vmid_context(struct tlb_inv_context *cxt) > else > __load_host_stage2(); > > - if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) { > - /* Ensure write of the old VMID */ > - isb(); > + /* Ensure write of the old VMID */ > + isb(); > > + if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) { > if (!(cxt->sctlr & SCTLR_ELx_M)) { > write_sysreg_el1(cxt->sctlr, SYS_SCTLR); > isb(); The patch itself makes sense to me. I'd just like to understand how we end-up in that situation. Thanks, M.
Hey Marc, Thanks for having a look. On Wed, Aug 14, 2024 at 02:30:53PM +0100, Marc Zyngier wrote: > On Wed, 14 Aug 2024 13:34:29 +0100, > Will Deacon <will@kernel.org> wrote: > > > > When the target context passed to enter_vmid_context() matches the > > current running context, the function returns early without manipulating > > the registers of the stage-2 MMU. This can result in a stale VMID due to > > the lack of an ISB instruction in exit_vmid_context() after writing the > > This refers to the exit_vmid_context() *before* a subsequent > enter_vmid_context(), right? Yes! > > VTTBR when ARM64_WORKAROUND_SPECULATIVE_AT is not enabled. > > I'm not sure I completely get the failure mode, so please bear with me. Of course, it's confused me a tonne of times and writing the commit message was a real pain. > > For example, with pKVM enabled: > > > > // Initially running in host context > > enter_vmid_context(guest); > > -> __load_stage2(guest); isb // Writes VTCR & VTTBR > > exit_vmid_context(guest); > > -> __load_stage2(host); // Restores VTCR & VTTBR > > > > enter_vmid_context(host); > > -> Returns early as we're already in host context > > Is this the code you are referring to? > > vcpu = host_ctxt->__hyp_running_vcpu; > > [...] > > if (vcpu) { > /* > * We're in guest context. However, for this to work, this needs > * to be called from within __kvm_vcpu_run(), which ensures that > * __hyp_running_vcpu is set to the current guest vcpu. > */ > if (mmu == vcpu->arch.hw_mmu || WARN_ON(mmu != host_s2_mmu)) > return; > > cxt->mmu = vcpu->arch.hw_mmu; > } else { > /* We're in host context. */ > if (mmu == host_s2_mmu) > return; > > cxt->mmu = host_s2_mmu; > } Yes, this is the "early return" where we don't bother touching the MMU because we're already in the correct context. > So I take it that no vcpu is running at this point, and that we have > done a vcpu_put(). > > Is there an actual path within pKVM that causes a guest TLBI to be > followed by a host __kvm_tlb_flush_vmid() *without* a CSE? I can't > convinced myself that such a path exist in the current upstream code > base. I think you're right that this can't happen upstream. We see the issue in Android when reclaiming pages from a guest during teardown. That amounts to unmapping pages from the guest, poisoning them and mapping them back into the host. Mapping them into the host can then trigger table -> block conversion and the associated TLB invalidation wasn't effective because it was still using the guest VMID. We can carry this patch in Android if you prefer, but given that {enter,exit}_vmid_context() are upstream, it would be nice to land the fix so that we don't run into this bug again in future (it took some debugging!). Cheers, Will
On Thu, 15 Aug 2024 13:08:03 +0100, Will Deacon <will@kernel.org> wrote: > > > Is there an actual path within pKVM that causes a guest TLBI to be > > followed by a host __kvm_tlb_flush_vmid() *without* a CSE? I can't > > convinced myself that such a path exist in the current upstream code > > base. > > I think you're right that this can't happen upstream. We see the issue > in Android when reclaiming pages from a guest during teardown. That > amounts to unmapping pages from the guest, poisoning them and mapping > them back into the host. Mapping them into the host can then trigger > table -> block conversion and the associated TLB invalidation wasn't > effective because it was still using the guest VMID. > > We can carry this patch in Android if you prefer, but given that > {enter,exit}_vmid_context() are upstream, it would be nice to land the > fix so that we don't run into this bug again in future (it took some > debugging!). I think it is definitely worth addressing, and given that this is nVHE only, an extra CSE isn't going to show on the radar. The question is more whether this is 6.11 or 6.12 material. If that's not an immediate fix for upstream, I'm tempted to queue it for 6.12. Does this work for you? M.
On Thu, Aug 15, 2024 at 01:31:54PM +0100, Marc Zyngier wrote: > On Thu, 15 Aug 2024 13:08:03 +0100, > Will Deacon <will@kernel.org> wrote: > > > > > Is there an actual path within pKVM that causes a guest TLBI to be > > > followed by a host __kvm_tlb_flush_vmid() *without* a CSE? I can't > > > convinced myself that such a path exist in the current upstream code > > > base. > > > > I think you're right that this can't happen upstream. We see the issue > > in Android when reclaiming pages from a guest during teardown. That > > amounts to unmapping pages from the guest, poisoning them and mapping > > them back into the host. Mapping them into the host can then trigger > > table -> block conversion and the associated TLB invalidation wasn't > > effective because it was still using the guest VMID. > > > > We can carry this patch in Android if you prefer, but given that > > {enter,exit}_vmid_context() are upstream, it would be nice to land the > > fix so that we don't run into this bug again in future (it took some > > debugging!). > > I think it is definitely worth addressing, and given that this is nVHE > only, an extra CSE isn't going to show on the radar. > > The question is more whether this is 6.11 or 6.12 material. If that's > not an immediate fix for upstream, I'm tempted to queue it for 6.12. > > Does this work for you? 6.12 is absolutely fine, thank you! Will
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c index ca3c09df8d7c..48da9ca9763f 100644 --- a/arch/arm64/kvm/hyp/nvhe/tlb.c +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c @@ -132,10 +132,10 @@ static void exit_vmid_context(struct tlb_inv_context *cxt) else __load_host_stage2(); - if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) { - /* Ensure write of the old VMID */ - isb(); + /* Ensure write of the old VMID */ + isb(); + if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) { if (!(cxt->sctlr & SCTLR_ELx_M)) { write_sysreg_el1(cxt->sctlr, SYS_SCTLR); isb();
When the target context passed to enter_vmid_context() matches the current running context, the function returns early without manipulating the registers of the stage-2 MMU. This can result in a stale VMID due to the lack of an ISB instruction in exit_vmid_context() after writing the VTTBR when ARM64_WORKAROUND_SPECULATIVE_AT is not enabled. For example, with pKVM enabled: // Initially running in host context enter_vmid_context(guest); -> __load_stage2(guest); isb // Writes VTCR & VTTBR exit_vmid_context(guest); -> __load_stage2(host); // Restores VTCR & VTTBR enter_vmid_context(host); -> Returns early as we're already in host context tlbi vmalls12e1is // !!! Can use the stale VMID as we // haven't performed context // synchronisation since restoring // VTTBR.VMID Add an unconditional ISB instruction to exit_vmid_context() after restoring the VTTBR. This already existed for the ARM64_WORKAROUND_SPECULATIVE_AT path, so we can simply hoist that onto the common path. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Fuad Tabba <tabba@google.com> Fixes: 58f3b0fc3b87 ("KVM: arm64: Support TLB invalidation in guest context") Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/kvm/hyp/nvhe/tlb.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)