diff mbox series

[59/89] KVM: arm64: Do not support MTE for protected VMs

Message ID 20220519134204.5379-60-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Base support for the pKVM hypervisor at EL2 | expand

Commit Message

Will Deacon May 19, 2022, 1:41 p.m. UTC
From: Fuad Tabba <tabba@google.com>

Return an error (-EINVAL) if trying to enable MTE on a protected
vm.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/arm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Peter Collingbourne May 26, 2022, 8:08 p.m. UTC | #1
On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
>
> From: Fuad Tabba <tabba@google.com>
>
> Return an error (-EINVAL) if trying to enable MTE on a protected
> vm.

I think this commit message needs more explanation as to why MTE is
not currently supported in protected VMs.

Peter
Fuad Tabba May 27, 2022, 7:55 a.m. UTC | #2
Hi Peter,

On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
>
> On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> >
> > From: Fuad Tabba <tabba@google.com>
> >
> > Return an error (-EINVAL) if trying to enable MTE on a protected
> > vm.
>
> I think this commit message needs more explanation as to why MTE is
> not currently supported in protected VMs.

Yes, we need to explain this more. Basically this is an extension of
restricting features for protected VMs done earlier [*].

Various VM feature configurations are allowed in KVM/arm64, each requiring
specific handling logic to deal with traps, context-switching and potentially
emulation. Achieving feature parity in pKVM therefore requires either elevating
this logic to EL2 (and substantially increasing the TCB) or continuing to trust
the host handlers at EL1. Since neither of these options are especially
appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
configuration based on the underlying hardware and which can mostly be provided
straightforwardly by EL2.

This of course can change in the future and we can support more
features for protected VMs as needed. We'll expand on this commit
message when we respin.

Also note that this only applies to protected VMs. Non-protected VMs
in protected mode support MTE.

Cheers,
/fuad

[*] https://lore.kernel.org/kvmarm/20210827101609.2808181-1-tabba@google.com/
>
> Peter
Peter Collingbourne June 3, 2022, 3 a.m. UTC | #3
Hi Fuad,

On Fri, May 27, 2022 at 08:55:42AM +0100, Fuad Tabba wrote:
> Hi Peter,
> 
> On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
> >
> > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > From: Fuad Tabba <tabba@google.com>
> > >
> > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > vm.
> >
> > I think this commit message needs more explanation as to why MTE is
> > not currently supported in protected VMs.
> 
> Yes, we need to explain this more. Basically this is an extension of
> restricting features for protected VMs done earlier [*].
> 
> Various VM feature configurations are allowed in KVM/arm64, each requiring
> specific handling logic to deal with traps, context-switching and potentially
> emulation. Achieving feature parity in pKVM therefore requires either elevating
> this logic to EL2 (and substantially increasing the TCB) or continuing to trust
> the host handlers at EL1. Since neither of these options are especially
> appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
> configuration based on the underlying hardware and which can mostly be provided
> straightforwardly by EL2.
> 
> This of course can change in the future and we can support more
> features for protected VMs as needed. We'll expand on this commit
> message when we respin.
> 
> Also note that this only applies to protected VMs. Non-protected VMs
> in protected mode support MTE.

I see. In this case unless I'm missing something the EL2 side seems
quite trivial though (flipping some bits in HCR_EL2). The patch below
(in place of this one) seems to make MTE work in my test environment
(patched [1] crosvm on Android in MTE-enabled QEMU).

[1] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3689015

From c87965cd14515586d487872486e7670874209113 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne <pcc@google.com>
Date: Thu, 2 Jun 2022 19:16:02 -0700
Subject: [PATCH] arm64: support MTE in protected VMs

Enable HCR_EL2.ATA while running a vCPU with MTE enabled.

To avoid exposing MTE tags from the host to protected VMs, sanitize
tags before donating pages.

Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 arch/arm64/include/asm/kvm_pkvm.h | 4 +++-
 arch/arm64/kvm/hyp/nvhe/pkvm.c    | 6 +++---
 arch/arm64/kvm/mmu.c              | 4 +++-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 952e3c3fa32d..9ca9296f2a25 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -73,10 +73,12 @@ void kvm_shadow_destroy(struct kvm *kvm);
  * Allow for protected VMs:
  * - Branch Target Identification
  * - Speculative Store Bypassing
+ * - Memory Tagging Extension
  */
 #define PVM_ID_AA64PFR1_ALLOW (\
 	ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \
-	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \
+	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) | \
+	ARM64_FEATURE_MASK(ID_AA64PFR1_MTE) \
 	)
 
 /*
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index e33ba9067d7b..46ddd9093ac7 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -88,7 +88,7 @@ static void pvm_init_traps_aa64pfr1(struct kvm_vcpu *vcpu)
 	/* Memory Tagging: Trap and Treat as Untagged if not supported. */
 	if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), feature_ids)) {
 		hcr_set |= HCR_TID5;
-		hcr_clear |= HCR_DCT | HCR_ATA;
+		hcr_clear |= HCR_ATA;
 	}
 
 	vcpu->arch.hcr_el2 |= hcr_set;
@@ -179,8 +179,8 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
 	 * - Feature id registers: to control features exposed to guests
 	 * - Implementation-defined features
 	 */
-	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
-			     HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
+	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS | HCR_TID3 | HCR_TACR | HCR_TIDCP |
+			     HCR_TID1 | HCR_ATA;
 
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
 		/* route synchronous external abort exceptions to EL2 */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 392ff7b2362d..f513852357f7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1206,8 +1206,10 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		goto dec_account;
 	}
 
-	write_lock(&kvm->mmu_lock);
 	pfn = page_to_pfn(page);
+	sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+
+	write_lock(&kvm->mmu_lock);
 	ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
 	if (ret) {
 		if (ret == -EAGAIN)
Marc Zyngier June 4, 2022, 8:26 a.m. UTC | #4
On Fri, 03 Jun 2022 04:00:29 +0100,
Peter Collingbourne <pcc@google.com> wrote:
> 
> Hi Fuad,
> 
> On Fri, May 27, 2022 at 08:55:42AM +0100, Fuad Tabba wrote:
> > Hi Peter,
> > 
> > On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
> > >
> > > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > From: Fuad Tabba <tabba@google.com>
> > > >
> > > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > > vm.
> > >
> > > I think this commit message needs more explanation as to why MTE is
> > > not currently supported in protected VMs.
> > 
> > Yes, we need to explain this more. Basically this is an extension of
> > restricting features for protected VMs done earlier [*].
> > 
> > Various VM feature configurations are allowed in KVM/arm64, each requiring
> > specific handling logic to deal with traps, context-switching and potentially
> > emulation. Achieving feature parity in pKVM therefore requires either elevating
> > this logic to EL2 (and substantially increasing the TCB) or continuing to trust
> > the host handlers at EL1. Since neither of these options are especially
> > appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
> > configuration based on the underlying hardware and which can mostly be provided
> > straightforwardly by EL2.
> > 
> > This of course can change in the future and we can support more
> > features for protected VMs as needed. We'll expand on this commit
> > message when we respin.
> > 
> > Also note that this only applies to protected VMs. Non-protected VMs
> > in protected mode support MTE.
> 
> I see. In this case unless I'm missing something the EL2 side seems
> quite trivial though (flipping some bits in HCR_EL2). The patch below
> (in place of this one) seems to make MTE work in my test environment
> (patched [1] crosvm on Android in MTE-enabled QEMU).
> 
> [1] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3689015
> 
> From c87965cd14515586d487872486e7670874209113 Mon Sep 17 00:00:00 2001
> From: Peter Collingbourne <pcc@google.com>
> Date: Thu, 2 Jun 2022 19:16:02 -0700
> Subject: [PATCH] arm64: support MTE in protected VMs
> 
> Enable HCR_EL2.ATA while running a vCPU with MTE enabled.
> 
> To avoid exposing MTE tags from the host to protected VMs, sanitize
> tags before donating pages.
> 
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> ---
>  arch/arm64/include/asm/kvm_pkvm.h | 4 +++-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c    | 6 +++---
>  arch/arm64/kvm/mmu.c              | 4 +++-
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index 952e3c3fa32d..9ca9296f2a25 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -73,10 +73,12 @@ void kvm_shadow_destroy(struct kvm *kvm);
>   * Allow for protected VMs:
>   * - Branch Target Identification
>   * - Speculative Store Bypassing
> + * - Memory Tagging Extension
>   */
>  #define PVM_ID_AA64PFR1_ALLOW (\
>  	ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \
> -	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \
> +	ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) | \
> +	ARM64_FEATURE_MASK(ID_AA64PFR1_MTE) \
>  	)
>  
>  /*
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index e33ba9067d7b..46ddd9093ac7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -88,7 +88,7 @@ static void pvm_init_traps_aa64pfr1(struct kvm_vcpu *vcpu)
>  	/* Memory Tagging: Trap and Treat as Untagged if not supported. */
>  	if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), feature_ids)) {
>  		hcr_set |= HCR_TID5;
> -		hcr_clear |= HCR_DCT | HCR_ATA;
> +		hcr_clear |= HCR_ATA;
>  	}
>  
>  	vcpu->arch.hcr_el2 |= hcr_set;
> @@ -179,8 +179,8 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
>  	 * - Feature id registers: to control features exposed to guests
>  	 * - Implementation-defined features
>  	 */
> -	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
> -			     HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
> +	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS | HCR_TID3 | HCR_TACR | HCR_TIDCP |
> +			     HCR_TID1 | HCR_ATA;
>  
>  	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
>  		/* route synchronous external abort exceptions to EL2 */
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 392ff7b2362d..f513852357f7 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1206,8 +1206,10 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		goto dec_account;
>  	}
>  
> -	write_lock(&kvm->mmu_lock);
>  	pfn = page_to_pfn(page);
> +	sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> +
> +	write_lock(&kvm->mmu_lock);

Is it really safe to rely on the host to clear the tags? My guts
feeling says that it isn't. If it is required, we cannot leave this
responsibility to the host, and this logic must be moved to EL2. And
if it isn't, then we should drop it.

>  	ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
>  	if (ret) {
>  		if (ret == -EAGAIN)

But the bigger picture here is what ensures that the host cannot mess
with the guest tags? I don't think we have a any mechanism to
guarantee that, specially on systems where the tags are only a memory
carve-out, which the host could map and change at will.

In any case, this isn't the time to pile new features on top of
pKVM. The current plan is to not support MTE at all, and only do it
once we have a definitive story on page donation (which as you may
have noticed, is pretty hacky). I don't see any compelling reason to
add MTE to the mix until this is solved.

Thanks,

	M.
Peter Collingbourne June 7, 2022, 12:20 a.m. UTC | #5
On Sat, Jun 4, 2022 at 1:26 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 03 Jun 2022 04:00:29 +0100,
> Peter Collingbourne <pcc@google.com> wrote:
> >
> > Hi Fuad,
> >
> > On Fri, May 27, 2022 at 08:55:42AM +0100, Fuad Tabba wrote:
> > > Hi Peter,
> > >
> > > On Thu, May 26, 2022 at 9:08 PM Peter Collingbourne <pcc@google.com> wrote:
> > > >
> > > > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > > > >
> > > > > From: Fuad Tabba <tabba@google.com>
> > > > >
> > > > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > > > vm.
> > > >
> > > > I think this commit message needs more explanation as to why MTE is
> > > > not currently supported in protected VMs.
> > >
> > > Yes, we need to explain this more. Basically this is an extension of
> > > restricting features for protected VMs done earlier [*].
> > >
> > > Various VM feature configurations are allowed in KVM/arm64, each requiring
> > > specific handling logic to deal with traps, context-switching and potentially
> > > emulation. Achieving feature parity in pKVM therefore requires either elevating
> > > this logic to EL2 (and substantially increasing the TCB) or continuing to trust
> > > the host handlers at EL1. Since neither of these options are especially
> > > appealing, pKVM instead limits the CPU features exposed to a guest to a fixed
> > > configuration based on the underlying hardware and which can mostly be provided
> > > straightforwardly by EL2.
> > >
> > > This of course can change in the future and we can support more
> > > features for protected VMs as needed. We'll expand on this commit
> > > message when we respin.
> > >
> > > Also note that this only applies to protected VMs. Non-protected VMs
> > > in protected mode support MTE.
> >
> > I see. In this case unless I'm missing something the EL2 side seems
> > quite trivial though (flipping some bits in HCR_EL2). The patch below
> > (in place of this one) seems to make MTE work in my test environment
> > (patched [1] crosvm on Android in MTE-enabled QEMU).
> >
> > [1] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3689015
> >
> > From c87965cd14515586d487872486e7670874209113 Mon Sep 17 00:00:00 2001
> > From: Peter Collingbourne <pcc@google.com>
> > Date: Thu, 2 Jun 2022 19:16:02 -0700
> > Subject: [PATCH] arm64: support MTE in protected VMs
> >
> > Enable HCR_EL2.ATA while running a vCPU with MTE enabled.
> >
> > To avoid exposing MTE tags from the host to protected VMs, sanitize
> > tags before donating pages.
> >
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_pkvm.h | 4 +++-
> >  arch/arm64/kvm/hyp/nvhe/pkvm.c    | 6 +++---
> >  arch/arm64/kvm/mmu.c              | 4 +++-
> >  3 files changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> > index 952e3c3fa32d..9ca9296f2a25 100644
> > --- a/arch/arm64/include/asm/kvm_pkvm.h
> > +++ b/arch/arm64/include/asm/kvm_pkvm.h
> > @@ -73,10 +73,12 @@ void kvm_shadow_destroy(struct kvm *kvm);
> >   * Allow for protected VMs:
> >   * - Branch Target Identification
> >   * - Speculative Store Bypassing
> > + * - Memory Tagging Extension
> >   */
> >  #define PVM_ID_AA64PFR1_ALLOW (\
> >       ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \
> > -     ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \
> > +     ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) | \
> > +     ARM64_FEATURE_MASK(ID_AA64PFR1_MTE) \
> >       )
> >
> >  /*
> > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > index e33ba9067d7b..46ddd9093ac7 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > @@ -88,7 +88,7 @@ static void pvm_init_traps_aa64pfr1(struct kvm_vcpu *vcpu)
> >       /* Memory Tagging: Trap and Treat as Untagged if not supported. */
> >       if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), feature_ids)) {
> >               hcr_set |= HCR_TID5;
> > -             hcr_clear |= HCR_DCT | HCR_ATA;
> > +             hcr_clear |= HCR_ATA;
> >       }
> >
> >       vcpu->arch.hcr_el2 |= hcr_set;
> > @@ -179,8 +179,8 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
> >        * - Feature id registers: to control features exposed to guests
> >        * - Implementation-defined features
> >        */
> > -     vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
> > -                          HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
> > +     vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS | HCR_TID3 | HCR_TACR | HCR_TIDCP |
> > +                          HCR_TID1 | HCR_ATA;
> >
> >       if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> >               /* route synchronous external abort exceptions to EL2 */
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 392ff7b2362d..f513852357f7 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1206,8 +1206,10 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >               goto dec_account;
> >       }
> >
> > -     write_lock(&kvm->mmu_lock);
> >       pfn = page_to_pfn(page);
> > +     sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> > +
> > +     write_lock(&kvm->mmu_lock);
>
> Is it really safe to rely on the host to clear the tags? My guts
> feeling says that it isn't. If it is required, we cannot leave this
> responsibility to the host, and this logic must be moved to EL2. And
> if it isn't, then we should drop it.

The goal here isn't to protect the guest. It's already the case that
whatever the page contents are when the page is donated (from the
perspective of the KVM client), that's what the guest sees. That
applies to both data and (in non-protected VMs) tags.

The code that I added here is for solving a different problem, which
is to avoid exposing stale host state to the guest, which the KVM
client may not even be aware of. We sanitize pages before exposing
them in non-protected VMs for the same reason.

> >       ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
> >       if (ret) {
> >               if (ret == -EAGAIN)
>
> But the bigger picture here is what ensures that the host cannot mess
> with the guest tags? I don't think we have a any mechanism to
> guarantee that, specially on systems where the tags are only a memory
> carve-out, which the host could map and change at will.

Right, I forgot about that. We probably only want to expose MTE to
guests if we have some indication (through the device tree or ACPI) of
how to protect the guest tag storage.

> In any case, this isn't the time to pile new features on top of
> pKVM. The current plan is to not support MTE at all, and only do it
> once we have a definitive story on page donation (which as you may
> have noticed, is pretty hacky). I don't see any compelling reason to
> add MTE to the mix until this is solved.

It sounds reasonable to land a basic set of features to begin with and
add MTE later. I'll develop my MTE-in-pKVM patch series as a followup
on top of this series.

Peter
Peter Collingbourne June 7, 2022, 12:42 a.m. UTC | #6
On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
>
> From: Fuad Tabba <tabba@google.com>
>
> Return an error (-EINVAL) if trying to enable MTE on a protected
> vm.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/kvm/arm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 10e036bf06e3..8a1b4ba1dfa7 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -90,7 +90,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 break;
>         case KVM_CAP_ARM_MTE:
>                 mutex_lock(&kvm->lock);
> -               if (!system_supports_mte() || kvm->created_vcpus) {
> +               if (!system_supports_mte() ||
> +                   kvm_vm_is_protected(kvm) ||

Should this check be added to kvm_vm_ioctl_check_extension() as well?

Peter
Fuad Tabba June 8, 2022, 7:40 a.m. UTC | #7
Hi Peter,

On Tue, Jun 7, 2022 at 1:42 AM Peter Collingbourne <pcc@google.com> wrote:
>
> On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> >
> > From: Fuad Tabba <tabba@google.com>
> >
> > Return an error (-EINVAL) if trying to enable MTE on a protected
> > vm.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/kvm/arm.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 10e036bf06e3..8a1b4ba1dfa7 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -90,7 +90,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                 break;
> >         case KVM_CAP_ARM_MTE:
> >                 mutex_lock(&kvm->lock);
> > -               if (!system_supports_mte() || kvm->created_vcpus) {
> > +               if (!system_supports_mte() ||
> > +                   kvm_vm_is_protected(kvm) ||
>
> Should this check be added to kvm_vm_ioctl_check_extension() as well?

No need. kvm_vm_ioctl_check_extension() calls pkvm_check_extension()
for protected vms, which functions as an allow list rather than a
block list.

Cheers,
/fuad


> Peter
Peter Collingbourne June 8, 2022, 5:39 p.m. UTC | #8
On Wed, Jun 8, 2022 at 12:40 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Peter,
>
> On Tue, Jun 7, 2022 at 1:42 AM Peter Collingbourne <pcc@google.com> wrote:
> >
> > On Thu, May 19, 2022 at 7:40 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > From: Fuad Tabba <tabba@google.com>
> > >
> > > Return an error (-EINVAL) if trying to enable MTE on a protected
> > > vm.
> > >
> > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > ---
> > >  arch/arm64/kvm/arm.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 10e036bf06e3..8a1b4ba1dfa7 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -90,7 +90,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > >                 break;
> > >         case KVM_CAP_ARM_MTE:
> > >                 mutex_lock(&kvm->lock);
> > > -               if (!system_supports_mte() || kvm->created_vcpus) {
> > > +               if (!system_supports_mte() ||
> > > +                   kvm_vm_is_protected(kvm) ||
> >
> > Should this check be added to kvm_vm_ioctl_check_extension() as well?
>
> No need. kvm_vm_ioctl_check_extension() calls pkvm_check_extension()
> for protected vms, which functions as an allow list rather than a
> block list.

I see. I guess I got confused when reading the code because I saw this
in kvm_check_extension():

        case KVM_CAP_ARM_NISV_TO_USER:
                r = !kvm || !kvm_vm_is_protected(kvm);
                break;

This can probably be simplified to "r = 1;".

Peter
Catalin Marinas June 8, 2022, 6:41 p.m. UTC | #9
On Mon, Jun 06, 2022 at 05:20:39PM -0700, Peter Collingbourne wrote:
> On Sat, Jun 4, 2022 at 1:26 AM Marc Zyngier <maz@kernel.org> wrote:
> > But the bigger picture here is what ensures that the host cannot mess
> > with the guest tags? I don't think we have a any mechanism to
> > guarantee that, specially on systems where the tags are only a memory
> > carve-out, which the host could map and change at will.
> 
> Right, I forgot about that. We probably only want to expose MTE to
> guests if we have some indication (through the device tree or ACPI) of
> how to protect the guest tag storage.

I think this would be useful irrespective of MTE. Some SoCs (though I
hope very rare these days) may allow for physical aliasing of RAM but if
the host stage 2 only protects one of the aliases, it's not of much use.

I am yet to fully understand how pKVM works but with the separation of
the hyp from the host kernel, it may have to actually parse the
DT/ACPI/EFI tables itself if it cannot rely on what the host kernel told
it. IIUC currently it creates an idmap at stage 2 for the host kernel,
only unmapped if the memory was assigned to a guest. But not sure what
happens with the rest of the host physical address space (devices etc.),
I presume they are fully accessible by the host kernel in stage 2.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 10e036bf06e3..8a1b4ba1dfa7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -90,7 +90,9 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_ARM_MTE:
 		mutex_lock(&kvm->lock);
-		if (!system_supports_mte() || kvm->created_vcpus) {
+		if (!system_supports_mte() ||
+		    kvm_vm_is_protected(kvm) ||
+		    kvm->created_vcpus) {
 			r = -EINVAL;
 		} else {
 			r = 0;