Message ID | 20240530210714.364118-4-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDX MMU prep series part 1 | expand |
On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Add a mirrored pointer to struct kvm_mmu_page for the private page table and > add helper functions to allocate/initialize/free a private page table page. > Because KVM TDP MMU doesn't use unsync_children and write_flooding_count, > pack them to have room for a pointer and use a union to avoid memory > overhead. > > For private GPA, CPU refers to a private page table whose contents are > encrypted. The dedicated APIs to operate on it (e.g. updating/reading its > PTE entry) are used, and their cost is expensive. > > When KVM resolves the KVM page fault, it walks the page tables. To reuse > the existing KVM MMU code and mitigate the heavy cost of directly walking > the private page table, allocate one more page for the mirrored page table > for the KVM MMU code to directly walk. Resolve the KVM page fault with > the existing code, and do additional operations necessary for the private > page table. To distinguish such cases, the existing KVM page table is > called a shared page table (i.e., not associated with a private page > table), and the page table with a private page table is called a mirrored > page table. The relationship is depicted below. > > KVM page fault | > | | > V | > -------------+---------- | > | | | > V V | > shared GPA private GPA | > | | | > V V | > shared PT root mirror PT root | private PT root > | | | | > V V | V > shared PT mirror PT --propagate--> private/mirrored PT > | | | | > | \-----------------+------\ | > | | | | > V | V V > shared guest page | private guest page > | > non-encrypted memory | encrypted memory > | > PT: Page table > Shared PT: visible to KVM, and the CPU uses it for shared mappings. > Private/mirrored PT: the CPU uses it, but it is invisible to KVM. TDX > module updates this table to map private guest pages. > Mirror PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it > to propagate PT change to the actual private PT. Which one is the "Mirror" and which one is the "Mirrored" PT is uncomfortably confusing. I hate to bikeshed even more, but while I like "Mirror PT" (a lot), I would stick with "Private" or perhaps "External" for the pages owned by the TDX module. > + /* > + * This cache is to allocate private page table. E.g. private EPT used > + * by the TDX module. > + */ > + struct kvm_mmu_memory_cache mmu_mirrored_spt_cache; So this would be "mmu_external_spt_cache". > - unsigned int unsync_children; > + union { > + /* Those two members aren't used for TDP MMU */ s/Those/These/ > + struct { > + unsigned int unsync_children; > + /* > + * Number of writes since the last time traversal > + * visited this page. > + */ > + atomic_t write_flooding_count; > + }; > + /* > + * Page table page of private PT. > + * Passed to TDX module, not accessed by KVM. > + */ > + void *mirrored_spt; external_spt > +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * mirrored_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access mirrored_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} > + > +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * private_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access private_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} Duplicate function. Naming aside, looks good. Paolo On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Add a mirrored pointer to struct kvm_mmu_page for the private page table and > add helper functions to allocate/initialize/free a private page table page. > Because KVM TDP MMU doesn't use unsync_children and write_flooding_count, > pack them to have room for a pointer and use a union to avoid memory > overhead. > > For private GPA, CPU refers to a private page table whose contents are > encrypted. The dedicated APIs to operate on it (e.g. updating/reading its > PTE entry) are used, and their cost is expensive. > > When KVM resolves the KVM page fault, it walks the page tables. To reuse > the existing KVM MMU code and mitigate the heavy cost of directly walking > the private page table, allocate one more page for the mirrored page table > for the KVM MMU code to directly walk. Resolve the KVM page fault with > the existing code, and do additional operations necessary for the private > page table. To distinguish such cases, the existing KVM page table is > called a shared page table (i.e., not associated with a private page > table), and the page table with a private page table is called a mirrored > page table. The relationship is depicted below. > > KVM page fault | > | | > V | > -------------+---------- | > | | | > V V | > shared GPA private GPA | > | | | > V V | > shared PT root mirror PT root | private PT root > | | | | > V V | V > shared PT mirror PT --propagate--> private/mirrored PT > | | | | > | \-----------------+------\ | > | | | | > V | V V > shared guest page | private guest page > | > non-encrypted memory | encrypted memory > | > PT: Page table > Shared PT: visible to KVM, and the CPU uses it for shared mappings. > Private/mirrored PT: the CPU uses it, but it is invisible to KVM. TDX > module updates this table to map private guest pages. > Mirror PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it > to propagate PT change to the actual private PT. > > Add a helper kvm_has_mirrored_tdp() to trigger this behavior and wire it > to the TDX vm type. > > Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> > --- > TDX MMU Prep v2: > - Rename private->mirror > - Don't trigger off of shared mask > > TDX MMU Prep: > - Rename terminology, dummy PT => mirror PT. and updated the commit message > By Rick and Kai. > - Added a comment on union of private_spt by Rick. > - Don't handle the root case in kvm_mmu_alloc_private_spt(), it will not > be needed in future patches. (Rick) > - Update comments (Yan) > - Remove kvm_mmu_init_private_spt(), open code it in later patches (Yan) > > v19: > - typo in the comment in kvm_mmu_alloc_private_spt() > - drop CONFIG_KVM_MMU_PRIVATE > --- > arch/x86/include/asm/kvm_host.h | 5 ++++ > arch/x86/kvm/mmu.h | 5 ++++ > arch/x86/kvm/mmu/mmu.c | 7 +++++ > arch/x86/kvm/mmu/mmu_internal.h | 47 ++++++++++++++++++++++++++++++--- > arch/x86/kvm/mmu/tdp_mmu.c | 1 + > 5 files changed, 61 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index aabf1648a56a..250899a0239b 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -817,6 +817,11 @@ struct kvm_vcpu_arch { > struct kvm_mmu_memory_cache mmu_shadow_page_cache; > struct kvm_mmu_memory_cache mmu_shadowed_info_cache; > struct kvm_mmu_memory_cache mmu_page_header_cache; > + /* > + * This cache is to allocate private page table. E.g. private EPT used > + * by the TDX module. > + */ > + struct kvm_mmu_memory_cache mmu_mirrored_spt_cache; > > /* > * QEMU userspace and the guest each have their own FPU state. > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index dc80e72e4848..0c3bf89cf7db 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -318,4 +318,9 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu, > return gpa; > return translate_nested_gpa(vcpu, gpa, access, exception); > } > + > +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) > +{ > + return kvm->arch.vm_type == KVM_X86_TDX_VM; > +} > #endif > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index b97241945596..5070ba7c6e89 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -685,6 +685,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) > 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); > if (r) > return r; > + if (kvm_has_mirrored_tdp(vcpu->kvm)) { > + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache, > + PT64_ROOT_MAX_LEVEL); > + if (r) > + return r; > + } > r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, > PT64_ROOT_MAX_LEVEL); > if (r) > @@ -704,6 +710,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); > + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache); > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); > } > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index 706f0ce8784c..faef40a561f9 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -101,7 +101,22 @@ struct kvm_mmu_page { > int root_count; > refcount_t tdp_mmu_root_count; > }; > - unsigned int unsync_children; > + union { > + /* Those two members aren't used for TDP MMU */ > + struct { > + unsigned int unsync_children; > + /* > + * Number of writes since the last time traversal > + * visited this page. > + */ > + atomic_t write_flooding_count; > + }; > + /* > + * Page table page of private PT. > + * Passed to TDX module, not accessed by KVM. > + */ > + void *mirrored_spt; > + }; > union { > struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ > tdp_ptep_t ptep; > @@ -124,9 +139,6 @@ struct kvm_mmu_page { > int clear_spte_count; > #endif > > - /* Number of writes since the last time traversal visited this page. */ > - atomic_t write_flooding_count; > - > #ifdef CONFIG_X86_64 > /* Used for freeing the page asynchronously if it is a TDP MMU page. */ > struct rcu_head rcu_head; > @@ -145,6 +157,33 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) > return kvm_mmu_role_as_id(sp->role); > } > > +static inline void *kvm_mmu_mirrored_spt(struct kvm_mmu_page *sp) > +{ > + return sp->mirrored_spt; > +} > + > +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * mirrored_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access mirrored_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} > + > +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * private_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access private_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} > + > static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) > { > /* > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 1259dd63defc..e7cd4921afe7 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -53,6 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) > > static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) > { > + free_page((unsigned long)sp->mirrored_spt); > free_page((unsigned long)sp->spt); > kmem_cache_free(mmu_page_header_cache, sp); > } > -- > 2.34.1 >
On Thu, 2024-06-06 at 18:04 +0200, Paolo Bonzini wrote: > > PT: Page table > > Shared PT: visible to KVM, and the CPU uses it for shared mappings. > > Private/mirrored PT: the CPU uses it, but it is invisible to KVM. TDX > > module updates this table to map private guest pages. > > Mirror PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it > > to propagate PT change to the actual private PT. > > Which one is the "Mirror" and which one is the "Mirrored" PT is > uncomfortably confusing. > > I hate to bikeshed even more, but while I like "Mirror PT" (a lot), I > would stick with "Private" or perhaps "External" for the pages owned > by the TDX module. Thanks. Yes, mirror and mirrored is too close. Kai raised the point that TDX's special need for separate, dedicated "private" PTEs is confusing from the SEV and SW protected VM perspective, so we've tried to remove private in that context. "External" seems like a good name. [snip] > > > +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct > > kvm_mmu_page *sp) > > +{ > > + /* > > + * mirrored_spt is allocated for TDX module to hold private EPT > > mappings, > > + * TDX module will initialize the page by itself. > > + * Therefore, KVM does not need to initialize or access > > mirrored_spt. > > + * KVM only interacts with sp->spt for mirrored EPT operations. > > + */ > > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu- > > >arch.mmu_mirrored_spt_cache); > > +} > > + > > +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct > > kvm_mmu_page *sp) > > +{ > > + /* > > + * private_spt is allocated for TDX module to hold private EPT > > mappings, > > + * TDX module will initialize the page by itself. > > + * Therefore, KVM does not need to initialize or access private_spt. > > + * KVM only interacts with sp->spt for mirrored EPT operations. > > + */ > > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu- > > >arch.mmu_mirrored_spt_cache); > > +} > > Duplicate function. Argh, I thought I fixed this before sending it out. > > Naming aside, looks good. Great, thanks for the review! Snipped comments all sound good.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index aabf1648a56a..250899a0239b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -817,6 +817,11 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_shadow_page_cache; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; + /* + * This cache is to allocate private page table. E.g. private EPT used + * by the TDX module. + */ + struct kvm_mmu_memory_cache mmu_mirrored_spt_cache; /* * QEMU userspace and the guest each have their own FPU state. diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index dc80e72e4848..0c3bf89cf7db 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -318,4 +318,9 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu, return gpa; return translate_nested_gpa(vcpu, gpa, access, exception); } + +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) +{ + return kvm->arch.vm_type == KVM_X86_TDX_VM; +} #endif diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b97241945596..5070ba7c6e89 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -685,6 +685,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); if (r) return r; + if (kvm_has_mirrored_tdp(vcpu->kvm)) { + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache, + PT64_ROOT_MAX_LEVEL); + if (r) + return r; + } r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, PT64_ROOT_MAX_LEVEL); if (r) @@ -704,6 +710,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 706f0ce8784c..faef40a561f9 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -101,7 +101,22 @@ struct kvm_mmu_page { int root_count; refcount_t tdp_mmu_root_count; }; - unsigned int unsync_children; + union { + /* Those two members aren't used for TDP MMU */ + struct { + unsigned int unsync_children; + /* + * Number of writes since the last time traversal + * visited this page. + */ + atomic_t write_flooding_count; + }; + /* + * Page table page of private PT. + * Passed to TDX module, not accessed by KVM. + */ + void *mirrored_spt; + }; union { struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ tdp_ptep_t ptep; @@ -124,9 +139,6 @@ struct kvm_mmu_page { int clear_spte_count; #endif - /* Number of writes since the last time traversal visited this page. */ - atomic_t write_flooding_count; - #ifdef CONFIG_X86_64 /* Used for freeing the page asynchronously if it is a TDP MMU page. */ struct rcu_head rcu_head; @@ -145,6 +157,33 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) return kvm_mmu_role_as_id(sp->role); } +static inline void *kvm_mmu_mirrored_spt(struct kvm_mmu_page *sp) +{ + return sp->mirrored_spt; +} + +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) +{ + /* + * mirrored_spt is allocated for TDX module to hold private EPT mappings, + * TDX module will initialize the page by itself. + * Therefore, KVM does not need to initialize or access mirrored_spt. + * KVM only interacts with sp->spt for mirrored EPT operations. + */ + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); +} + +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) +{ + /* + * private_spt is allocated for TDX module to hold private EPT mappings, + * TDX module will initialize the page by itself. + * Therefore, KVM does not need to initialize or access private_spt. + * KVM only interacts with sp->spt for mirrored EPT operations. + */ + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); +} + static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) { /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1259dd63defc..e7cd4921afe7 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -53,6 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) { + free_page((unsigned long)sp->mirrored_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); }