[59/89] KVM: arm64: Do not support MTE for protected VMs

Message ID	20220519134204.5379-60-will@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Will Deacon <will@kernel.org> To: kvmarm@lists.cs.columbia.edu Cc: Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>, Sean Christopherson <seanjc@google.com>, Alexandru Elisei <alexandru.elisei@arm.com>, Andy Lutomirski <luto@amacapital.net>, Catalin Marinas <catalin.marinas@arm.com>, James Morse <james.morse@arm.com>, Chao Peng <chao.p.peng@linux.intel.com>, Quentin Perret <qperret@google.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Michael Roth <michael.roth@amd.com>, Mark Rutland <mark.rutland@arm.com>, Fuad Tabba <tabba@google.com>, Oliver Upton <oupton@google.com>, Marc Zyngier <maz@kernel.org>, kernel-team@android.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH 59/89] KVM: arm64: Do not support MTE for protected VMs Date: Thu, 19 May 2022 14:41:34 +0100 Message-Id: <20220519134204.5379-60-will@kernel.org> In-Reply-To: <20220519134204.5379-1-will@kernel.org> References: <20220519134204.5379-1-will@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM: arm64: Base support for the pKVM hypervisor at EL2 \| expand [00/89] KVM: arm64: Base support for the pKVM hypervisor at EL2 [01/89] KVM: arm64: Handle all ID registers trapped for a protected VM [02/89] KVM: arm64: Remove redundant hyp_assert_lock_held() assertions [03/89] KVM: arm64: Return error from kvm_arch_init_vm() on allocation failure [04/89] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE [05/89] KVM: arm64: Extend comment in has_vhe() [06/89] KVM: arm64: Drop stale comment [07/89] KVM: arm64: Move hyp refcount manipulation helpers [08/89] KVM: arm64: Back hyp_vmemmap for all of memory [09/89] KVM: arm64: Unify identifiers used to distinguish host and hypervisor [10/89] KVM: arm64: Implement do_donate() helper for donating memory [11/89] KVM: arm64: Prevent the donation of no-map pages [12/89] KVM: arm64: Add helpers to pin memory shared with hyp [13/89] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h [14/89] KVM: arm64: Add hyp_spinlock_t static initializer [15/89] KVM: arm64: Introduce shadow VM state at EL2 [16/89] KVM: arm64: Instantiate VM shadow data from EL1 [17/89] KVM: arm64: Make hyp stage-1 refcnt correct on the whole range [18/89] KVM: arm64: Factor out private range VA allocation [19/89] KVM: arm64: Add pcpu fixmap infrastructure at EL2 [20/89] KVM: arm64: Provide I-cache invalidation by VA at EL2 [21/89] KVM: arm64: Allow non-coallescable pages in a hyp_pool [22/89] KVM: arm64: Add generic hyp_memcache helpers [23/89] KVM: arm64: Instantiate guest stage-2 page-tables at EL2 [24/89] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache [25/89] KVM: arm64: Add flags to struct hyp_page [26/89] KVM: arm64: Provide a hypercall for the host to reclaim guest memory [27/89] KVM: arm64: Extend memory sharing to allow host-to-guest transitions [28/89] KVM: arm64: Consolidate stage-2 init in one function [29/89] KVM: arm64: Check for PTE validity when checking for executable/cacheable [30/89] KVM: arm64: Do not allow memslot changes after first VM run under pKVM [31/89] KVM: arm64: Disallow dirty logging and RO memslots with pKVM [32/89] KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run() [33/89] KVM: arm64: Handle guest stage-2 page-tables entirely at EL2 [34/89] KVM: arm64: Don't access kvm_arm_hyp_percpu_base at EL1 [35/89] KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host [36/89] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 [37/89] KVM: arm64: Explicitly map kvm_vgic_global_state at EL2 [38/89] KVM: arm64: Don't map host sections in pkvm [39/89] KVM: arm64: Extend memory donation to allow host-to-guest transitions [40/89] KVM: arm64: Split up nvhe/fixed_config.h [41/89] KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code [42/89] KVM: arm64: Simplify vgic-v3 hypercalls [43/89] KVM: arm64: Add the {flush,sync}_vgic_state() primitives [44/89] KVM: arm64: Introduce predicates to check for protected state [45/89] KVM: arm64: Add the {flush,sync}_timer_state() primitives [46/89] KVM: arm64: Introduce the pkvm_vcpu_{load,put} hypercalls [47/89] KVM: arm64: Add current vcpu and shadow_state lookup primitive [48/89] KVM: arm64: Skip __kvm_adjust_pc() for protected vcpus [49/89] KVM: arm64: Add hyp per_cpu variable to track current physical cpu number [50/89] KVM: arm64: Ensure that TLBs and I-cache are private to each vcpu [51/89] KVM: arm64: Introduce per-EC entry/exit handlers [52/89] KVM: arm64: Introduce lazy-ish state sync for non-protected VMs [53/89] KVM: arm64: Lazy host FP save/restore [54/89] KVM: arm64: Reduce host/shadow vcpu state copying [55/89] KVM: arm64: Do not pass the vcpu to __pkvm_host_map_guest() [56/89] KVM: arm64: Check directly whether the vcpu is protected [57/89] KVM: arm64: Trap debug break and watch from guest [58/89] KVM: arm64: Restrict protected VM capabilities [59/89] KVM: arm64: Do not support MTE for protected VMs [60/89] KVM: arm64: Refactor reset_mpidr to extract its computation [61/89] KVM: arm64: Reset sysregs for protected VMs [62/89] KVM: arm64: Move pkvm_vcpu_init_traps to shadow vcpu init [63/89] KVM: arm64: Fix initializing traps in protected mode [64/89] KVM: arm64: Advertise GICv3 sysreg interface to protected guests [65/89] KVM: arm64: Force injection of a data abort on NISV MMIO exit [66/89] KVM: arm64: Donate memory to protected guests [67/89] KVM: arm64: Add EL2 entry/exit handlers for pKVM guests [68/89] KVM: arm64: Move vgic state between host and shadow vcpu structures [69/89] KVM: arm64: Do not update virtual timer state for protected VMs [70/89] KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp use [71/89] KVM: arm64: Initialize shadow vm state at hyp [72/89] KVM: arm64: Track the SVE state in the shadow vcpu [73/89] KVM: arm64: Add HVC handling for protected guests at EL2 [74/89] KVM: arm64: Move pstate reset values to kvm_arm.h [75/89] KVM: arm64: Move some kvm_psci functions to a shared header [76/89] KVM: arm64: Factor out vcpu_reset code for core registers and PSCI [77/89] KVM: arm64: Handle PSCI for protected VMs in EL2 [78/89] KVM: arm64: Don't expose TLBI hypercalls after de-privilege [79/89] KVM: arm64: Add is_pkvm_initialized() helper [80/89] KVM: arm64: Refactor enter_exception64() [81/89] KVM: arm64: Inject SIGSEGV on illegal accesses [82/89] KVM: arm64: Support TLB invalidation in guest context [83/89] KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE [84/89] KVM: arm64: Extend memory sharing to allow guest-to-host transitions [85/89] KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst [86/89] KVM: arm64: Reformat/beautify PTP hypercall documentation [87/89] KVM: arm64: Expose memory sharing hypercalls to protected guests [88/89] KVM: arm64: Introduce KVM_VM_TYPE_ARM_PROTECTED machine type for PVMs [89/89] Documentation: KVM: Add some documentation for Protected KVM on arm64

Will Deacon May 19, 2022, 1:41 p.m. UTC

From: Fuad Tabba <tabba@google.com>

Return an error (-EINVAL) if trying to enable MTE on a protected
vm.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/arm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Peter Collingbourne May 26, 2022, 8:08 p.m. UTC | #1

On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
>
> From: Fuad Tabba <tabba@google.com>
>
> Return an error (-EINVAL) if trying to enable MTE on a protected
> vm.

I think this commit message needs more explanation as to why MTE is
not currently supported in protected VMs.

Peter

Fuad Tabba May 27, 2022, 7:55 a.m. UTC | #2

Hi Peter,

On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
>
> On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> >
> > From: Fuad Tabba <tabba@google.com>
> >
> > Return an error (-EINVAL) if trying to enable MTE on a protected
> > vm.
>
> I think this commit message needs more explanation as to why MTE is
> not currently supported in protected VMs.

Yes, we need to explain this more. Basically this is an extension of
restricting features for protected VMs done earlier [*].

Various VM feature configurations are allowed in KVM/arm64, each requiring
specific handling logic to deal with traps, context-switching and potentially
emulation. Achieving feature parity in pKVM therefore requires either elevating
this logic to EL2 (and substantially increasing the TCB) or continuing to trust
the host handlers at EL1. Since neither of these options are especially
appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
configuration based on the underlying hardware and which can mostly be provided
straightforwardly by EL2.

This of course can change in the future and we can support more
features for protected VMs as needed. We'll expand on this commit
message when we respin.

Also note that this only applies to protected VMs. Non-protected VMs
in protected mode support MTE.

Cheers,
/fuad

[*] https://lore.kernel.org/kvmarm/20210827101609.2808181-1-tabba@google.com/
>
> Peter

Peter Collingbourne June 3, 2022, 3 a.m. UTC | #3

Hi Fuad,

On Fri, May 27, 2022 at 08:55:42AM +0100, Fuad Tabba wrote:
> Hi Peter,
> 
> On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
> >
> > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > From: Fuad Tabba <tabba@google.com>
> > >
> > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > vm.
> >
> > I think this commit message needs more explanation as to why MTE is
> > not currently supported in protected VMs.
> 
> Yes, we need to explain this more. Basically this is an extension of
> restricting features for protected VMs done earlier [*].
> 
> Various VM feature configurations are allowed in KVM/arm64, each requiring
> specific handling logic to deal with traps, context-switching and potentially
> emulation. Achieving feature parity in pKVM therefore requires either elevating
> this logic to EL2 (and substantially increasing the TCB) or continuing to trust
> the host handlers at EL1. Since neither of these options are especially
> appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
> configuration based on the underlying hardware and which can mostly be provided
> straightforwardly by EL2.
> 
> This of course can change in the future and we can support more
> features for protected VMs as needed. We'll expand on this commit
> message when we respin.
> 
> Also note that this only applies to protected VMs. Non-protected VMs
> in protected mode support MTE.

I see. In this case unless I'm missing something the EL2 side seems
quite trivial though (flipping some bits in HCR_EL2). The patch below
(in place of this one) seems to make MTE work in my test environment
(patched [1] crosvm on Android in MTE-enabled QEMU).

[1] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3689015

From c87965cd14515586d487872486e7670874209113 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne <pcc@google.com>
Date: Thu, 2 Jun 2022 19:16:02 -0700
Subject: [PATCH] arm64: support MTE in protected VMs

Enable HCR_EL2.ATA while running a vCPU with MTE enabled.

To avoid exposing MTE tags from the host to protected VMs, sanitize
tags before donating pages.

Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 arch/arm64/include/asm/kvm_pkvm.h | 4 +++-
 arch/arm64/kvm/hyp/nvhe/pkvm.c    | 6 +++---
 arch/arm64/kvm/mmu.c              | 4 +++-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 952e3c3fa32d..9ca9296f2a25 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -73,10 +73,12 @@ void kvm_shadow_destroy(struct kvm *kvm);
  * Allow for protected VMs:
  * - Branch Target Identification
  * - Speculative Store Bypassing
+ * - Memory Tagging Extension
  */
 #define PVM_ID_AA64PFR1_ALLOW (\
 	ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \
-	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \
+	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) | \
+	ARM64_FEATURE_MASK(ID_AA64PFR1_MTE) \
 	)
 
 /*
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index e33ba9067d7b..46ddd9093ac7 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -88,7 +88,7 @@ static void pvm_init_traps_aa64pfr1(struct kvm_vcpu *vcpu)
 	/* Memory Tagging: Trap and Treat as Untagged if not supported. */
 	if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), feature_ids)) {
 		hcr_set |= HCR_TID5;
-		hcr_clear |= HCR_DCT | HCR_ATA;
+		hcr_clear |= HCR_ATA;
 	}
 
 	vcpu->arch.hcr_el2 |= hcr_set;
@@ -179,8 +179,8 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
 	 * - Feature id registers: to control features exposed to guests
 	 * - Implementation-defined features
 	 */
-	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
-			     HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
+	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS | HCR_TID3 | HCR_TACR | HCR_TIDCP |
+			     HCR_TID1 | HCR_ATA;
 
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
 		/* route synchronous external abort exceptions to EL2 */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 392ff7b2362d..f513852357f7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1206,8 +1206,10 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		goto dec_account;
 	}
 
-	write_lock(&kvm->mmu_lock);
 	pfn = page_to_pfn(page);
+	sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+
+	write_lock(&kvm->mmu_lock);
 	ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
 	if (ret) {
 		if (ret == -EAGAIN)

Marc Zyngier June 4, 2022, 8:26 a.m. UTC | #4

On Fri, 03 Jun 2022 04:00:29 +0100,
Peter Collingbourne <pcc@google.com> wrote:
> 
> Hi Fuad,
> 
> On Fri, May 27, 2022 at 08:55:42AM +0100, Fuad Tabba wrote:
> > Hi Peter,
> > 
> > On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
> > >
> > > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > From: Fuad Tabba <tabba@google.com>
> > > >
> > > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > > vm.
> > >
> > > I think this commit message needs more explanation as to why MTE is
> > > not currently supported in protected VMs.
> > 
> > Yes, we need to explain this more. Basically this is an extension of
> > restricting features for protected VMs done earlier [*].
> > 
> > Various VM feature configurations are allowed in KVM/arm64, each requiring
> > specific handling logic to deal with traps, context-switching and potentially
> > emulation. Achieving feature parity in pKVM therefore requires either elevating
> > this logic to EL2 (and substantially increasing the TCB) or continuing to trust
> > the host handlers at EL1. Since neither of these options are especially
> > appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
> > configuration based on the underlying hardware and which can mostly be provided
> > straightforwardly by EL2.
> > 
> > This of course can change in the future and we can support more
> > features for protected VMs as needed. We'll expand on this commit
> > message when we respin.
> > 
> > Also note that this only applies to protected VMs. Non-protected VMs
> > in protected mode support MTE.
> 
> I see. In this case unless I'm missing something the EL2 side seems
> quite trivial though (flipping some bits in HCR_EL2). The patch below
> (in place of this one) seems to make MTE work in my test environment
> (patched [1] crosvm on Android in MTE-enabled QEMU).
> 
> [1] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3689015
> 
> From c87965cd14515586d487872486e7670874209113 Mon Sep 17 00:00:00 2001
> From: Peter Collingbourne <pcc@google.com>
> Date: Thu, 2 Jun 2022 19:16:02 -0700
> Subject: [PATCH] arm64: support MTE in protected VMs
> 
> Enable HCR_EL2.ATA while running a vCPU with MTE enabled.
> 
> To avoid exposing MTE tags from the host to protected VMs, sanitize
> tags before donating pages.
> 
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> ---
>  arch/arm64/include/asm/kvm_pkvm.h | 4 +++-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c    | 6 +++---
>  arch/arm64/kvm/mmu.c              | 4 +++-
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index 952e3c3fa32d..9ca9296f2a25 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -73,10 +73,12 @@ void kvm_shadow_destroy(struct kvm *kvm);
>   * Allow for protected VMs:
>   * - Branch Target Identification
>   * - Speculative Store Bypassing
> + * - Memory Tagging Extension
>   */
>  #define PVM_ID_AA64PFR1_ALLOW (\
>  	ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \
> -	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \
> +	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) | \
> +	ARM64_FEATURE_MASK(ID_AA64PFR1_MTE) \
>  	)
>  
>  /*
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index e33ba9067d7b..46ddd9093ac7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -88,7 +88,7 @@ static void pvm_init_traps_aa64pfr1(struct kvm_vcpu *vcpu)
>  	/* Memory Tagging: Trap and Treat as Untagged if not supported. */
>  	if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), feature_ids)) {
>  		hcr_set |= HCR_TID5;
> -		hcr_clear |= HCR_DCT | HCR_ATA;
> +		hcr_clear |= HCR_ATA;
>  	}
>  
>  	vcpu->arch.hcr_el2 |= hcr_set;
> @@ -179,8 +179,8 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
>  	 * - Feature id registers: to control features exposed to guests
>  	 * - Implementation-defined features
>  	 */
> -	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
> -			     HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
> +	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS | HCR_TID3 | HCR_TACR | HCR_TIDCP |
> +			     HCR_TID1 | HCR_ATA;
>  
>  	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
>  		/* route synchronous external abort exceptions to EL2 */
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 392ff7b2362d..f513852357f7 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1206,8 +1206,10 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		goto dec_account;
>  	}
>  
> -	write_lock(&kvm->mmu_lock);
>  	pfn = page_to_pfn(page);
> +	sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> +
> +	write_lock(&kvm->mmu_lock);

Is it really safe to rely on the host to clear the tags? My guts
feeling says that it isn't. If it is required, we cannot leave this
responsibility to the host, and this logic must be moved to EL2. And
if it isn't, then we should drop it.

>  	ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
>  	if (ret) {
>  		if (ret == -EAGAIN)

But the bigger picture here is what ensures that the host cannot mess
with the guest tags? I don't think we have a any mechanism to
guarantee that, specially on systems where the tags are only a memory
carve-out, which the host could map and change at will.

In any case, this isn't the time to pile new features on top of
pKVM. The current plan is to not support MTE at all, and only do it
once we have a definitive story on page donation (which as you may
have noticed, is pretty hacky). I don't see any compelling reason to
add MTE to the mix until this is solved.

Thanks,

	M.

Peter Collingbourne June 7, 2022, 12:20 a.m. UTC | #5

On Sat, Jun 4, 2022 at 1:26 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 03 Jun 2022 04:00:29 +0100,
> Peter Collingbourne <pcc@google.com> wrote:
> >
> > Hi Fuad,
> >
> > On Fri, May 27, 2022 at 08:55:42AM +0100, Fuad Tabba wrote:
> > > Hi Peter,
> > >
> > > On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
> > > >
> > > > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > > > >
> > > > > From: Fuad Tabba <tabba@google.com>
> > > > >
> > > > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > > > vm.
> > > >
> > > > I think this commit message needs more explanation as to why MTE is
> > > > not currently supported in protected VMs.
> > >
> > > Yes, we need to explain this more. Basically this is an extension of
> > > restricting features for protected VMs done earlier [*].
> > >
> > > Various VM feature configurations are allowed in KVM/arm64, each requiring
> > > specific handling logic to deal with traps, context-switching and potentially
> > > emulation. Achieving feature parity in pKVM therefore requires either elevating
> > > this logic to EL2 (and substantially increasing the TCB) or continuing to trust
> > > the host handlers at EL1. Since neither of these options are especially
> > > appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
> > > configuration based on the underlying hardware and which can mostly be provided
> > > straightforwardly by EL2.
> > >
> > > This of course can change in the future and we can support more
> > > features for protected VMs as needed. We'll expand on this commit
> > > message when we respin.
> > >
> > > Also note that this only applies to protected VMs. Non-protected VMs
> > > in protected mode support MTE.
> >
> > I see. In this case unless I'm missing something the EL2 side seems
> > quite trivial though (flipping some bits in HCR_EL2). The patch below
> > (in place of this one) seems to make MTE work in my test environment
> > (patched [1] crosvm on Android in MTE-enabled QEMU).
> >
> > [1] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3689015
> >
> > From c87965cd14515586d487872486e7670874209113 Mon Sep 17 00:00:00 2001
> > From: Peter Collingbourne <pcc@google.com>
> > Date: Thu, 2 Jun 2022 19:16:02 -0700
> > Subject: [PATCH] arm64: support MTE in protected VMs
> >
> > Enable HCR_EL2.ATA while running a vCPU with MTE enabled.
> >
> > To avoid exposing MTE tags from the host to protected VMs, sanitize
> > tags before donating pages.
> >
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_pkvm.h | 4 +++-
> >  arch/arm64/kvm/hyp/nvhe/pkvm.c    | 6 +++---
> >  arch/arm64/kvm/mmu.c              | 4 +++-
> >  3 files changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> > index 952e3c3fa32d..9ca9296f2a25 100644
> > --- a/arch/arm64/include/asm/kvm_pkvm.h
> > +++ b/arch/arm64/include/asm/kvm_pkvm.h
> > @@ -73,10 +73,12 @@ void kvm_shadow_destroy(struct kvm *kvm);
> >   * Allow for protected VMs:
> >   * - Branch Target Identification
> >   * - Speculative Store Bypassing
> > + * - Memory Tagging Extension
> >   */
> >  #define PVM_ID_AA64PFR1_ALLOW (\
> >       ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \
> > -     ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \
> > +     ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) | \
> > +     ARM64_FEATURE_MASK(ID_AA64PFR1_MTE) \
> >       )
> >
> >  /*
> > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > index e33ba9067d7b..46ddd9093ac7 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > @@ -88,7 +88,7 @@ static void pvm_init_traps_aa64pfr1(struct kvm_vcpu *vcpu)
> >       /* Memory Tagging: Trap and Treat as Untagged if not supported. */
> >       if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), feature_ids)) {
> >               hcr_set |= HCR_TID5;
> > -             hcr_clear |= HCR_DCT | HCR_ATA;
> > +             hcr_clear |= HCR_ATA;
> >       }
> >
> >       vcpu->arch.hcr_el2 |= hcr_set;
> > @@ -179,8 +179,8 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
> >        * - Feature id registers: to control features exposed to guests
> >        * - Implementation-defined features
> >        */
> > -     vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
> > -                          HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
> > +     vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS | HCR_TID3 | HCR_TACR | HCR_TIDCP |
> > +                          HCR_TID1 | HCR_ATA;
> >
> >       if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> >               /* route synchronous external abort exceptions to EL2 */
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 392ff7b2362d..f513852357f7 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1206,8 +1206,10 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >               goto dec_account;
> >       }
> >
> > -     write_lock(&kvm->mmu_lock);
> >       pfn = page_to_pfn(page);
> > +     sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> > +
> > +     write_lock(&kvm->mmu_lock);
>
> Is it really safe to rely on the host to clear the tags? My guts
> feeling says that it isn't. If it is required, we cannot leave this
> responsibility to the host, and this logic must be moved to EL2. And
> if it isn't, then we should drop it.

The goal here isn't to protect the guest. It's already the case that
whatever the page contents are when the page is donated (from the
perspective of the KVM client), that's what the guest sees. That
applies to both data and (in non-protected VMs) tags.

The code that I added here is for solving a different problem, which
is to avoid exposing stale host state to the guest, which the KVM
client may not even be aware of. We sanitize pages before exposing
them in non-protected VMs for the same reason.

> >       ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
> >       if (ret) {
> >               if (ret == -EAGAIN)
>
> But the bigger picture here is what ensures that the host cannot mess
> with the guest tags? I don't think we have a any mechanism to
> guarantee that, specially on systems where the tags are only a memory
> carve-out, which the host could map and change at will.

Right, I forgot about that. We probably only want to expose MTE to
guests if we have some indication (through the device tree or ACPI) of
how to protect the guest tag storage.

> In any case, this isn't the time to pile new features on top of
> pKVM. The current plan is to not support MTE at all, and only do it
> once we have a definitive story on page donation (which as you may
> have noticed, is pretty hacky). I don't see any compelling reason to
> add MTE to the mix until this is solved.

It sounds reasonable to land a basic set of features to begin with and
add MTE later. I'll develop my MTE-in-pKVM patch series as a followup
on top of this series.

Peter

Peter Collingbourne June 7, 2022, 12:42 a.m. UTC | #6

On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
>
> From: Fuad Tabba <tabba@google.com>
>
> Return an error (-EINVAL) if trying to enable MTE on a protected
> vm.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/kvm/arm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 10e036bf06e3..8a1b4ba1dfa7 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -90,7 +90,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 break;
>         case KVM_CAP_ARM_MTE:
>                 mutex_lock(&kvm->lock);
> -               if (!system_supports_mte() || kvm->created_vcpus) {
> +               if (!system_supports_mte() ||
> +                   kvm_vm_is_protected(kvm) ||

Should this check be added to kvm_vm_ioctl_check_extension() as well?

Peter

Fuad Tabba June 8, 2022, 7:40 a.m. UTC | #7

Hi Peter,

On Tue, Jun 7, 2022 at 1:42 AM Peter Collingbourne <pcc@google.com> wrote:
>
> On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> >
> > From: Fuad Tabba <tabba@google.com>
> >
> > Return an error (-EINVAL) if trying to enable MTE on a protected
> > vm.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/kvm/arm.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 10e036bf06e3..8a1b4ba1dfa7 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -90,7 +90,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                 break;
> >         case KVM_CAP_ARM_MTE:
> >                 mutex_lock(&kvm->lock);
> > -               if (!system_supports_mte() || kvm->created_vcpus) {
> > +               if (!system_supports_mte() ||
> > +                   kvm_vm_is_protected(kvm) ||
>
> Should this check be added to kvm_vm_ioctl_check_extension() as well?

No need. kvm_vm_ioctl_check_extension() calls pkvm_check_extension()
for protected vms, which functions as an allow list rather than a
block list.

Cheers,
/fuad


> Peter

Peter Collingbourne June 8, 2022, 5:39 p.m. UTC | #8

On Wed, Jun 8, 2022 at 12:40 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Peter,
>
> On Tue, Jun 7, 2022 at 1:42 AM Peter Collingbourne <pcc@google.com> wrote:
> >
> > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > From: Fuad Tabba <tabba@google.com>
> > >
> > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > vm.
> > >
> > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > ---
> > >  arch/arm64/kvm/arm.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 10e036bf06e3..8a1b4ba1dfa7 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -90,7 +90,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > >                 break;
> > >         case KVM_CAP_ARM_MTE:
> > >                 mutex_lock(&kvm->lock);
> > > -               if (!system_supports_mte() || kvm->created_vcpus) {
> > > +               if (!system_supports_mte() ||
> > > +                   kvm_vm_is_protected(kvm) ||
> >
> > Should this check be added to kvm_vm_ioctl_check_extension() as well?
>
> No need. kvm_vm_ioctl_check_extension() calls pkvm_check_extension()
> for protected vms, which functions as an allow list rather than a
> block list.

I see. I guess I got confused when reading the code because I saw this
in kvm_check_extension():

        case KVM_CAP_ARM_NISV_TO_USER:
                r = !kvm || !kvm_vm_is_protected(kvm);
                break;

This can probably be simplified to "r = 1;".

Peter

Catalin Marinas June 8, 2022, 6:41 p.m. UTC | #9

On Mon, Jun 06, 2022 at 05:20:39PM -0700, Peter Collingbourne wrote:
> On Sat, Jun 4, 2022 at 1:26 AM Marc Zyngier <maz@kernel.org> wrote:
> > But the bigger picture here is what ensures that the host cannot mess
> > with the guest tags? I don't think we have a any mechanism to
> > guarantee that, specially on systems where the tags are only a memory
> > carve-out, which the host could map and change at will.
> 
> Right, I forgot about that. We probably only want to expose MTE to
> guests if we have some indication (through the device tree or ACPI) of
> how to protect the guest tag storage.

I think this would be useful irrespective of MTE. Some SoCs (though I
hope very rare these days) may allow for physical aliasing of RAM but if
the host stage 2 only protects one of the aliases, it's not of much use.

I am yet to fully understand how pKVM works but with the separation of
the hyp from the host kernel, it may have to actually parse the
DT/ACPI/EFI tables itself if it cannot rely on what the host kernel told
it. IIUC currently it creates an idmap at stage 2 for the host kernel,
only unmapped if the memory was assigned to a guest. But not sure what
happens with the rest of the host physical address space (devices etc.),
I presume they are fully accessible by the host kernel in stage 2.

[59/89] KVM: arm64: Do not support MTE for protected VMs

Commit Message

Comments

Patch