diff mbox series

[16/18] KVM: x86: Take mem attributes into account when faulting memory

Message ID 20240609154945.55332-17-nsaenz@amazon.com (mailing list archive)
State New, archived
Headers show
Series Introducing Core Building Blocks for Hyper-V VSM Emulation | expand

Commit Message

Nicolas Saenz Julienne June 9, 2024, 3:49 p.m. UTC
Take into account access restrictions memory attributes when faulting
guest memory. Prohibited memory accesses will cause an user-space fault
exit.

Additionally, bypass a warning in the !tdp case. Access restrictions in
guest page tables might not necessarily match the host pte's when memory
attributes are in use.

Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
 arch/x86/kvm/mmu/mmu.c         | 64 ++++++++++++++++++++++++++++------
 arch/x86/kvm/mmu/mmutrace.h    | 29 +++++++++++++++
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 include/linux/kvm_host.h       |  4 +++
 4 files changed, 87 insertions(+), 12 deletions(-)

Comments

Nicolas Saenz Julienne Aug. 22, 2024, 3:21 p.m. UTC | #1
On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> Take into account access restrictions memory attributes when faulting
> guest memory. Prohibited memory accesses will cause an user-space fault
> exit.
>
> Additionally, bypass a warning in the !tdp case. Access restrictions in
> guest page tables might not necessarily match the host pte's when memory
> attributes are in use.
>
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>

I now realize that only taking into account memory attributes during
faults isn't good enough for VSM. We should check the attributes anytime
KVM takes GPAs as input for any action initiated by the guest. If the
memory attributes are incompatible with such action, it should be
stopped. Failure to do so opens side channels that unprivileged VTLs can
abuse to infer information about privileged VTL. Some examples I came up
with:
- Guest page walks: VTL0 could install malicious directory entries that
  point to GPAs only visible to VTL1. KVM will happily continue the
  walk. Among other things, this could be use to infer VTL1's GVA->GPA
  mappings.
- PV interfaces like the Hyper-V TSC page or VP assist page, could be
  used to modify portions of VTL1 memory.
- Hyper-V hypercalls that take GPAs as input/output can be abused in a
  myriad of ways. Including ones that exit into user-space.

We would be protected against all these if we implemented the memory
access restrictions through the memory slots API. As is, it has the
drawback of having to quiesce the whole VM for any non-trivial slot
modification (i.e. VSM's memory protections). But if we found a way to
speed up the slot updates we could rely on that, and avoid having to
teach kvm_read/write_guest() and friends to deal with memattrs. Note
that we would still need to use memory attributes to request for faults
to exit onto user-space on those select GPAs. Any opinions or
suggestions?

Note that, for now, I'll stick with the memory attributes approach to
see what the full solution looks like.

Nicolas
Sean Christopherson Aug. 22, 2024, 4:58 p.m. UTC | #2
On Thu, Aug 22, 2024, Nicolas Saenz Julienne wrote:
> On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> > Take into account access restrictions memory attributes when faulting
> > guest memory. Prohibited memory accesses will cause an user-space fault
> > exit.
> >
> > Additionally, bypass a warning in the !tdp case. Access restrictions in
> > guest page tables might not necessarily match the host pte's when memory
> > attributes are in use.
> >
> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> 
> I now realize that only taking into account memory attributes during
> faults isn't good enough for VSM. We should check the attributes anytime
> KVM takes GPAs as input for any action initiated by the guest. If the
> memory attributes are incompatible with such action, it should be
> stopped. Failure to do so opens side channels that unprivileged VTLs can
> abuse to infer information about privileged VTL. Some examples I came up
> with:
> - Guest page walks: VTL0 could install malicious directory entries that
>   point to GPAs only visible to VTL1. KVM will happily continue the
>   walk. Among other things, this could be use to infer VTL1's GVA->GPA
>   mappings.
> - PV interfaces like the Hyper-V TSC page or VP assist page, could be
>   used to modify portions of VTL1 memory.
> - Hyper-V hypercalls that take GPAs as input/output can be abused in a
>   myriad of ways. Including ones that exit into user-space.
> 
> We would be protected against all these if we implemented the memory
> access restrictions through the memory slots API. As is, it has the
> drawback of having to quiesce the whole VM for any non-trivial slot
> modification (i.e. VSM's memory protections). But if we found a way to
> speed up the slot updates we could rely on that, and avoid having to
> teach kvm_read/write_guest() and friends to deal with memattrs. Note
> that we would still need to use memory attributes to request for faults
> to exit onto user-space on those select GPAs. Any opinions or
> suggestions?
> 
> Note that, for now, I'll stick with the memory attributes approach to
> see what the full solution looks like.

FWIW, I suspect we'll be better off honoring memory attributes.  It's not just
the KVM side that has issues with memslot updates, my understanding is userspace
has also built up "slow" code with respect to memslot updates, in part because
it's such a slow path in KVM.
Nicolas Saenz Julienne Sept. 13, 2024, 6:26 p.m. UTC | #3
On Thu Aug 22, 2024 at 4:58 PM UTC, Sean Christopherson wrote:
> On Thu, Aug 22, 2024, Nicolas Saenz Julienne wrote:
> > On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> > > Take into account access restrictions memory attributes when faulting
> > > guest memory. Prohibited memory accesses will cause an user-space fault
> > > exit.
> > >
> > > Additionally, bypass a warning in the !tdp case. Access restrictions in
> > > guest page tables might not necessarily match the host pte's when memory
> > > attributes are in use.
> > >
> > > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> >
> > I now realize that only taking into account memory attributes during
> > faults isn't good enough for VSM. We should check the attributes anytime
> > KVM takes GPAs as input for any action initiated by the guest. If the
> > memory attributes are incompatible with such action, it should be
> > stopped. Failure to do so opens side channels that unprivileged VTLs can
> > abuse to infer information about privileged VTL. Some examples I came up
> > with:
> > - Guest page walks: VTL0 could install malicious directory entries that
> >   point to GPAs only visible to VTL1. KVM will happily continue the
> >   walk. Among other things, this could be use to infer VTL1's GVA->GPA
> >   mappings.
> > - PV interfaces like the Hyper-V TSC page or VP assist page, could be
> >   used to modify portions of VTL1 memory.
> > - Hyper-V hypercalls that take GPAs as input/output can be abused in a
> >   myriad of ways. Including ones that exit into user-space.
> >
> > We would be protected against all these if we implemented the memory
> > access restrictions through the memory slots API. As is, it has the
> > drawback of having to quiesce the whole VM for any non-trivial slot
> > modification (i.e. VSM's memory protections). But if we found a way to
> > speed up the slot updates we could rely on that, and avoid having to
> > teach kvm_read/write_guest() and friends to deal with memattrs. Note
> > that we would still need to use memory attributes to request for faults
> > to exit onto user-space on those select GPAs. Any opinions or
> > suggestions?
> >
> > Note that, for now, I'll stick with the memory attributes approach to
> > see what the full solution looks like.
>
> FWIW, I suspect we'll be better off honoring memory attributes.  It's not just
> the KVM side that has issues with memslot updates, my understanding is userspace
> has also built up "slow" code with respect to memslot updates, in part because
> it's such a slow path in KVM.

Sean, since I see you're looking at the series. I don't think it's worth
spending too much time with the memory attributes patches. Since
figuring out the sidechannels mentioned above, I found even more
shortcomings in this implementation. I'm reworking the whole thing in a
separate series [1], taking into account sidechannels, MMIO, non-TDP
MMUs, etc. and introducing selftests and an in-depth design document.

[1] https://github.com/vianpl/linux branch 'vsm/memory-protections' (wip)

Thanks,
Nicolas
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91edd873dcdbc..dfe50c9c31f7b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -754,7 +754,8 @@  static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 	return sp->role.access;
 }
 
-static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
+static void kvm_mmu_page_set_translation(struct kvm *kvm,
+					 struct kvm_mmu_page *sp, int index,
 					 gfn_t gfn, unsigned int access)
 {
 	if (sp_has_gptes(sp)) {
@@ -762,10 +763,17 @@  static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 		return;
 	}
 
-	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
-	          "access mismatch under %s page %llx (expected %u, got %u)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+	/*
+	 * Userspace might have introduced memory attributes for this gfn,
+	 * breaking the assumption that the spte's access restrictions match
+	 * the guest's. Userspace is also responsible from taking care of
+	 * faults caused by these 'artificial' access restrictions.
+	 */
+	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index) &&
+		  !kvm_get_memory_attributes(kvm, gfn),
+		  "access mismatch under %s page %llx (expected %u, got %u)\n",
+		  sp->role.passthrough ? "passthrough" : "direct", sp->gfn,
+		  kvm_mmu_page_get_access(sp, index), access);
 
 	WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
 	          "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
@@ -773,12 +781,12 @@  static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
 }
 
-static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
-				    unsigned int access)
+static void kvm_mmu_page_set_access(struct kvm *kvm, struct kvm_mmu_page *sp,
+				    int index, unsigned int access)
 {
 	gfn_t gfn = kvm_mmu_page_get_gfn(sp, index);
 
-	kvm_mmu_page_set_translation(sp, index, gfn, access);
+	kvm_mmu_page_set_translation(kvm, sp, index, gfn, access);
 }
 
 /*
@@ -1607,7 +1615,7 @@  static void __rmap_add(struct kvm *kvm,
 	int rmap_count;
 
 	sp = sptep_to_sp(spte);
-	kvm_mmu_page_set_translation(sp, spte_index(spte), gfn, access);
+	kvm_mmu_page_set_translation(kvm, sp, spte_index(spte), gfn, access);
 	kvm_update_page_stats(kvm, sp->role.level, 1);
 
 	rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
@@ -2928,7 +2936,8 @@  static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 		rmap_add(vcpu, slot, sptep, gfn, pte_access);
 	} else {
 		/* Already rmapped but the pte_access bits may have changed. */
-		kvm_mmu_page_set_access(sp, spte_index(sptep), pte_access);
+		kvm_mmu_page_set_access(vcpu->kvm, sp, spte_index(sptep),
+					pte_access);
 	}
 
 	return ret;
@@ -4320,6 +4329,38 @@  static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	return RET_PF_CONTINUE;
 }
 
+static int kvm_mem_attributes_faultin_access_prots(struct kvm_vcpu *vcpu,
+						   struct kvm_page_fault *fault)
+{
+	bool may_read, may_write, may_exec;
+	unsigned long attrs;
+
+	attrs = kvm_get_memory_attributes(vcpu->kvm, fault->gfn);
+	if (!attrs)
+		return RET_PF_CONTINUE;
+
+	if (!kvm_mem_attributes_valid(vcpu->kvm, attrs)) {
+		kvm_err("Invalid mem attributes 0x%lx found for address 0x%016llx\n",
+			attrs, fault->addr);
+		return -EFAULT;
+	}
+
+	trace_kvm_mem_attributes_faultin_access_prots(vcpu, fault, attrs);
+
+	may_read = kvm_mem_attributes_may_read(attrs);
+	may_write = kvm_mem_attributes_may_write(attrs);
+	may_exec = kvm_mem_attributes_may_exec(attrs);
+
+	if ((fault->user && !may_read) || (fault->write && !may_write) ||
+	    (fault->exec && !may_exec))
+		return -EFAULT;
+
+	fault->map_writable = may_write;
+	fault->map_executable = may_exec;
+
+	return RET_PF_CONTINUE;
+}
+
 static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	bool async;
@@ -4375,7 +4416,8 @@  static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	 * Now that we have a snapshot of mmu_invalidate_seq we can check for a
 	 * private vs. shared mismatch.
 	 */
-	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn) ||
+	    kvm_mem_attributes_faultin_access_prots(vcpu, fault)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return -EFAULT;
 	}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 195d98bc8de85..ddbdd7396e9fa 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -440,6 +440,35 @@  TRACE_EVENT(
 		  __entry->gfn, __entry->spte, __entry->level, __entry->errno)
 );
 
+TRACE_EVENT(kvm_mem_attributes_faultin_access_prots,
+	TP_PROTO(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+		 u64 mem_attrs),
+	TP_ARGS(vcpu, fault, mem_attrs),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(unsigned long, guest_rip)
+		__field(u64, fault_address)
+		__field(bool, write)
+		__field(bool, exec)
+		__field(u64, mem_attrs)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->guest_rip = kvm_rip_read(vcpu);
+		__entry->fault_address = fault->addr;
+		__entry->write = fault->write;
+		__entry->exec = fault->exec;
+		__entry->mem_attrs = mem_attrs;
+	),
+
+	TP_printk("vcpu %d rip 0x%lx gfn 0x%016llx access %s mem_attrs 0x%llx",
+		  __entry->vcpu_id, __entry->guest_rip, __entry->fault_address,
+		  __entry->exec ? "X" : (__entry->write ? "W" : "R"),
+		  __entry->mem_attrs)
+);
+
 #endif /* _TRACE_KVMMMU_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d3dbcf382ed2d..166f5f0e885e0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -954,7 +954,7 @@  static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
 		return 0;
 
 	/* Update the shadowed access bits in case they changed. */
-	kvm_mmu_page_set_access(sp, i, pte_access);
+	kvm_mmu_page_set_access(vcpu->kvm, sp, i, pte_access);
 
 	sptep = &sp->spt[i];
 	spte = *sptep;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 85378345e8e77..9c26161d13dea 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2463,6 +2463,10 @@  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
 	return false;
 }
+static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+{
+	return 0;
+}
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
 #ifdef CONFIG_KVM_PRIVATE_MEM