Message ID | 20240619223614.290657-14-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDX MMU prep series part 1 | expand |
On Wed, Jun 19, 2024 at 03:36:10PM -0700, Rick Edgecombe wrote: > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index e9c1783a8743..287dcc2685e4 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3701,7 +3701,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) > int r; > > if (tdp_mmu_enabled) { > - kvm_tdp_mmu_alloc_root(vcpu); > + if (kvm_has_mirrored_tdp(vcpu->kvm)) > + kvm_tdp_mmu_alloc_root(vcpu, true); > + kvm_tdp_mmu_alloc_root(vcpu, false); > return 0; > } mmu_alloc_direct_roots() is called when vcpu->arch.mmu->root.hpa is INVALID_PAGE in kvm_mmu_reload(), which could happen after direct root is invalidated. > -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) > +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool mirror) > { > struct kvm_mmu *mmu = vcpu->arch.mmu; > union kvm_mmu_page_role role = mmu->root_role; > @@ -241,6 +246,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) > struct kvm *kvm = vcpu->kvm; > struct kvm_mmu_page *root; > > + if (mirror) > + role.is_mirror = 1; > + Could we add a validity check of mirror_root_hpa to prevent an incorrect ref count increment of the mirror root? + if (mirror) { + if (mmu->mirror_root_hpa != INVALID_PAGE) + return; + role.is_mirror = true; + } > /* > * Check for an existing root before acquiring the pages lock to avoid > * unnecessary serialization if multiple vCPUs are loading a new root. > @@ -292,8 +300,12 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) > * and actually consuming the root if it's invalidated after dropping > * mmu_lock, and the root can't be freed as this vCPU holds a reference. > */ > - mmu->root.hpa = __pa(root->spt); > - mmu->root.pgd = 0; > + if (mirror) { > + mmu->mirror_root_hpa = __pa(root->spt); > + } else { > + mmu->root.hpa = __pa(root->spt); > + mmu->root.pgd = 0; > + } > }
On Mon, 2024-06-24 at 16:30 +0800, Yan Zhao wrote: > On Wed, Jun 19, 2024 at 03:36:10PM -0700, Rick Edgecombe wrote: > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index e9c1783a8743..287dcc2685e4 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3701,7 +3701,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu > > *vcpu) > > int r; > > > > if (tdp_mmu_enabled) { > > - kvm_tdp_mmu_alloc_root(vcpu); > > + if (kvm_has_mirrored_tdp(vcpu->kvm)) > > + kvm_tdp_mmu_alloc_root(vcpu, true); > > + kvm_tdp_mmu_alloc_root(vcpu, false); > > return 0; > > } > mmu_alloc_direct_roots() is called when vcpu->arch.mmu->root.hpa is > INVALID_PAGE > in kvm_mmu_reload(), which could happen after direct root is invalidated. > > > -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) > > +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool mirror) > > { > > struct kvm_mmu *mmu = vcpu->arch.mmu; > > union kvm_mmu_page_role role = mmu->root_role; > > @@ -241,6 +246,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) > > struct kvm *kvm = vcpu->kvm; > > struct kvm_mmu_page *root; > > > > + if (mirror) > > + role.is_mirror = 1; > > + > Could we add a validity check of mirror_root_hpa to prevent an incorrect ref > count increment of the mirror root? I was originally suspicious of the asymmetry of the tear down of mirror and direct roots vs the allocation. Do you see a concrete problem, or just advocating for safety? > > + if (mirror) { > + if (mmu->mirror_root_hpa != INVALID_PAGE) > + return; > + > role.is_mirror = true; > + }
On Tue, Jun 25, 2024 at 08:51:33AM +0800, Edgecombe, Rick P wrote: > On Mon, 2024-06-24 at 16:30 +0800, Yan Zhao wrote: > > On Wed, Jun 19, 2024 at 03:36:10PM -0700, Rick Edgecombe wrote: > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index e9c1783a8743..287dcc2685e4 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -3701,7 +3701,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu > > > *vcpu) > > > int r; > > > > > > if (tdp_mmu_enabled) { > > > - kvm_tdp_mmu_alloc_root(vcpu); > > > + if (kvm_has_mirrored_tdp(vcpu->kvm)) > > > + kvm_tdp_mmu_alloc_root(vcpu, true); > > > + kvm_tdp_mmu_alloc_root(vcpu, false); > > > return 0; > > > } > > mmu_alloc_direct_roots() is called when vcpu->arch.mmu->root.hpa is > > INVALID_PAGE > > in kvm_mmu_reload(), which could happen after direct root is invalidated. > > > > > -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) > > > +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool mirror) > > > { > > > struct kvm_mmu *mmu = vcpu->arch.mmu; > > > union kvm_mmu_page_role role = mmu->root_role; > > > @@ -241,6 +246,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) > > > struct kvm *kvm = vcpu->kvm; > > > struct kvm_mmu_page *root; > > > > > > + if (mirror) > > > + role.is_mirror = 1; > > > + > > Could we add a validity check of mirror_root_hpa to prevent an incorrect ref > > count increment of the mirror root? > > I was originally suspicious of the asymmetry of the tear down of mirror and > direct roots vs the allocation. Do you see a concrete problem, or just > advocating for safety? IMO it's a concrete problem, though rare up to now. e.g. After repeatedly hot-plugping and hot-unplugping memory, which increases memslots generation, kvm_mmu_zap_all_fast() will be called to invalidate direct roots when the memslots generation wraps around. > > > > > + if (mirror) { > > + if (mmu->mirror_root_hpa != INVALID_PAGE) > > + return; > > + > > role.is_mirror = true; > > + } >
On Tue, 2024-06-25 at 13:43 +0800, Yan Zhao wrote: > > > > I was originally suspicious of the asymmetry of the tear down of mirror > > > > and > > > > direct roots vs the allocation. Do you see a concrete problem, or just > > > > advocating for safety? > > IMO it's a concrete problem, though rare up to now. e.g. > > > > After repeatedly hot-plugping and hot-unplugping memory, which increases > > memslots generation, kvm_mmu_zap_all_fast() will be called to invalidate > > > direct > > roots when the memslots generation wraps around. Hmm, yes. I'm not sure about putting the check there though. It adds even more confusion to the lifecycle. - mirror_root_hpa != INVALID_PAGE check in a different placed than root.hpa != INVALID_PAGE check. - they get allocated in the same place - they are torn down in the different places. Can you think of clearer fix for it. Maybe we can just move the mirror root allocation such that it's not subjected to the reload path? Like something that matches the tear down in kvm_mmu_destroy()?
On Wed, Jun 26, 2024 at 04:33:20AM +0800, Edgecombe, Rick P wrote: > On Tue, 2024-06-25 at 13:43 +0800, Yan Zhao wrote: > > > > > I was originally suspicious of the asymmetry of the tear down of mirror > > > > > and > > > > > direct roots vs the allocation. Do you see a concrete problem, or just > > > > > advocating for safety? > > > IMO it's a concrete problem, though rare up to now. e.g. > > > > > > After repeatedly hot-plugping and hot-unplugping memory, which increases > > > memslots generation, kvm_mmu_zap_all_fast() will be called to invalidate > > > > direct > > > roots when the memslots generation wraps around. > > Hmm, yes. I'm not sure about putting the check there though. It adds even more > confusion to the lifecycle. > - mirror_root_hpa != INVALID_PAGE check in a different placed than > root.hpa != INVALID_PAGE check. > - they get allocated in the same place > - they are torn down in the different places. > > Can you think of clearer fix for it. Maybe we can just move the mirror root > allocation such that it's not subjected to the reload path? Like something that > matches the tear down in kvm_mmu_destroy()? But we still need the reload path to have each vcpu to hold a ref count of the mirror root. What about below fix? diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 026e8edfb0bd..4decd13457ec 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -127,9 +127,28 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, int bytes); +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) +{ + return kvm->arch.vm_type == KVM_X86_TDX_VM; +} + +static inline bool kvm_mmu_root_hpa_is_invalid(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.mmu->root.hpa == INVALID_PAGE; +} + +static inline bool kvm_mmu_mirror_root_hpa_is_invalid(struct kvm_vcpu *vcpu) +{ + if (!kvm_has_mirrored_tdp(vcpu->kvm)) + return false; + + return vcpu->arch.mmu->mirror_root_hpa == INVALID_PAGE; +} + static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) { - if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE)) + if (!kvm_mmu_root_hpa_is_invalid(vcpu) && + !kvm_mmu_mirror_root_hpa_is_invalid(vcpu)) return 0; return kvm_mmu_load(vcpu); @@ -322,11 +341,6 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu, return translate_nested_gpa(vcpu, gpa, access, exception); } -static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) -{ - return kvm->arch.vm_type == KVM_X86_TDX_VM; -} - static inline gfn_t kvm_gfn_direct_bits(const struct kvm *kvm) { return kvm->arch.gfn_direct_bits; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e1299eb03e63..5e7d92074f70 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3702,9 +3702,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) int r; if (tdp_mmu_enabled) { - if (kvm_has_mirrored_tdp(vcpu->kvm)) + if (kvm_mmu_mirror_root_hpa_is_invalid(vcpu)) kvm_tdp_mmu_alloc_root(vcpu, true); - kvm_tdp_mmu_alloc_root(vcpu, false); + + if (kvm_mmu_root_hpa_is_invalid(vcpu)) + kvm_tdp_mmu_alloc_root(vcpu, false); return 0; }
On Wed, 2024-06-26 at 13:05 +0800, Yan Zhao wrote: > But we still need the reload path to have each vcpu to hold a ref count of the > mirror root. > What about below fix? > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 026e8edfb0bd..4decd13457ec 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -127,9 +127,28 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); > void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, > int bytes); > > +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) > +{ > + return kvm->arch.vm_type == KVM_X86_TDX_VM; > +} > + > +static inline bool kvm_mmu_root_hpa_is_invalid(struct kvm_vcpu *vcpu) > +{ > + return vcpu->arch.mmu->root.hpa == INVALID_PAGE; > +} > + > +static inline bool kvm_mmu_mirror_root_hpa_is_invalid(struct kvm_vcpu *vcpu) > +{ > + if (!kvm_has_mirrored_tdp(vcpu->kvm)) > + return false; > + > + return vcpu->arch.mmu->mirror_root_hpa == INVALID_PAGE; > +} > + > static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) > { > - if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE)) > + if (!kvm_mmu_root_hpa_is_invalid(vcpu) && > + !kvm_mmu_mirror_root_hpa_is_invalid(vcpu)) If we go this way, we should keep the likely. But, I'm not convinced. > return 0; > > return kvm_mmu_load(vcpu); > @@ -322,11 +341,6 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu > *vcpu, > return translate_nested_gpa(vcpu, gpa, access, exception); > } > > -static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) > -{ > - return kvm->arch.vm_type == KVM_X86_TDX_VM; > -} > - > static inline gfn_t kvm_gfn_direct_bits(const struct kvm *kvm) > { > return kvm->arch.gfn_direct_bits; > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index e1299eb03e63..5e7d92074f70 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3702,9 +3702,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu > *vcpu) > int r; > > if (tdp_mmu_enabled) { > - if (kvm_has_mirrored_tdp(vcpu->kvm)) > + if (kvm_mmu_mirror_root_hpa_is_invalid(vcpu)) > kvm_tdp_mmu_alloc_root(vcpu, true); > - kvm_tdp_mmu_alloc_root(vcpu, false); > + > + if (kvm_mmu_root_hpa_is_invalid(vcpu)) > + kvm_tdp_mmu_alloc_root(vcpu, false); So we can have either: mirror_root_hpa = INVALID_PAGE root.hpa = INVALID_PAGE or: mirror_root_hpa = root root.hpa = INVALID_PAGE We don't ever have: mirror_root_hpa = INVALID_PAGE root.hpa = root Because mirror_root_hpa is special. > return 0; > } > Having the check in kvm_mmu_reload() and here both is really ugly. Right now we have kvm_mmu_reload() which only allocates new roots if needed, then kvm_mmu_load() goes and allocates/references them. Now mmu_alloc_direct_roots() has the same logic to only reload as needed. So could we just leave the kvm_mmu_reload() logic as is, and just have special logic in mmu_alloc_direct_roots() to not avoid re-allocating mirror roots? This is actually what v19 had, but it was thought to be a historical relic: "Historically we needed it. We don't need it now. We can drop it." https://lore.kernel.org/kvm/20240325200122.GH2357401@ls.amr.corp.intel.com/ So just have this small fixup instead? diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index ae5d5dee86af..2e062178222e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3705,7 +3705,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) int r; if (tdp_mmu_enabled) { - if (kvm_has_mirrored_tdp(vcpu->kvm)) + if (kvm_has_mirrored_tdp(vcpu->kvm) && + !VALID_PAGE(mmu->mirror_root_hpa)) kvm_tdp_mmu_alloc_root(vcpu, true); kvm_tdp_mmu_alloc_root(vcpu, false); return 0;
On Thu, Jul 04, 2024 at 03:40:35AM +0800, Edgecombe, Rick P wrote: > On Wed, 2024-06-26 at 13:05 +0800, Yan Zhao wrote: > > But we still need the reload path to have each vcpu to hold a ref count of the > > mirror root. > > What about below fix? > > > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > > index 026e8edfb0bd..4decd13457ec 100644 > > --- a/arch/x86/kvm/mmu.h > > +++ b/arch/x86/kvm/mmu.h > > @@ -127,9 +127,28 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); > > void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, > > int bytes); > > > > +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) > > +{ > > + return kvm->arch.vm_type == KVM_X86_TDX_VM; > > +} > > + > > +static inline bool kvm_mmu_root_hpa_is_invalid(struct kvm_vcpu *vcpu) > > +{ > > + return vcpu->arch.mmu->root.hpa == INVALID_PAGE; > > +} > > + > > +static inline bool kvm_mmu_mirror_root_hpa_is_invalid(struct kvm_vcpu *vcpu) > > +{ > > + if (!kvm_has_mirrored_tdp(vcpu->kvm)) > > + return false; > > + > > + return vcpu->arch.mmu->mirror_root_hpa == INVALID_PAGE; > > +} > > + > > static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) > > { > > - if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE)) > > + if (!kvm_mmu_root_hpa_is_invalid(vcpu) && > > + !kvm_mmu_mirror_root_hpa_is_invalid(vcpu)) > > If we go this way, we should keep the likely. But, I'm not convinced. > > > return 0; > > > > return kvm_mmu_load(vcpu); > > @@ -322,11 +341,6 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu > > *vcpu, > > return translate_nested_gpa(vcpu, gpa, access, exception); > > } > > > > -static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) > > -{ > > - return kvm->arch.vm_type == KVM_X86_TDX_VM; > > -} > > - > > static inline gfn_t kvm_gfn_direct_bits(const struct kvm *kvm) > > { > > return kvm->arch.gfn_direct_bits; > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index e1299eb03e63..5e7d92074f70 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3702,9 +3702,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu > > *vcpu) > > int r; > > > > if (tdp_mmu_enabled) { > > - if (kvm_has_mirrored_tdp(vcpu->kvm)) > > + if (kvm_mmu_mirror_root_hpa_is_invalid(vcpu)) > > kvm_tdp_mmu_alloc_root(vcpu, true); > > - kvm_tdp_mmu_alloc_root(vcpu, false); > > + > > + if (kvm_mmu_root_hpa_is_invalid(vcpu)) > > + kvm_tdp_mmu_alloc_root(vcpu, false); > > So we can have either: > mirror_root_hpa = INVALID_PAGE > root.hpa = INVALID_PAGE > > or: > mirror_root_hpa = root > root.hpa = INVALID_PAGE > > We don't ever have: > mirror_root_hpa = INVALID_PAGE > root.hpa = root > > Because mirror_root_hpa is special. > Good point. > > return 0; > > } > > > > Having the check in kvm_mmu_reload() and here both is really ugly. Right now we > have kvm_mmu_reload() which only allocates new roots if needed, then > kvm_mmu_load() goes and allocates/references them. Now mmu_alloc_direct_roots() > has the same logic to only reload as needed. > > So could we just leave the kvm_mmu_reload() logic as is, and just have special > logic in mmu_alloc_direct_roots() to not avoid re-allocating mirror roots? This > is actually what v19 had, but it was thought to be a historical relic: > "Historically we needed it. We don't need it now. We can drop it." > https://lore.kernel.org/kvm/20240325200122.GH2357401@ls.amr.corp.intel.com/ > > So just have this small fixup instead? Yes. > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index ae5d5dee86af..2e062178222e 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3705,7 +3705,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) > int r; > > if (tdp_mmu_enabled) { > - if (kvm_has_mirrored_tdp(vcpu->kvm)) > + if (kvm_has_mirrored_tdp(vcpu->kvm) && > + !VALID_PAGE(mmu->mirror_root_hpa)) > kvm_tdp_mmu_alloc_root(vcpu, true); > kvm_tdp_mmu_alloc_root(vcpu, false); > return 0; > Perhaps also a comment in kvm_mmu_reload() to address concerns like why checking only root.hpa in kvm_mmu_reload() is enough. diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 03737f3aaeeb..aba98c8cc67d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -129,6 +129,15 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) { + /* + * Checking root.hpa is sufficient even when KVM has mirror root. + * We can have either: + * (1) mirror_root_hpa = INVALID_PAGE, root.hpa = INVALID_PAGE + * (2) mirror_root_hpa = root , root.hpa = INVALID_PAGE + * (3) mirror_root_hpa = root1, root.hpa = root2 + * We don't ever have: + * mirror_root_hpa = INVALID_PAGE, root.hpa = root + */ if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE)) return 0; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a5f803f1d17e..eee35e958971 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3705,7 +3705,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) int r; if (tdp_mmu_enabled) { - if (kvm_has_mirrored_tdp(vcpu->kvm)) + if (kvm_has_mirrored_tdp(vcpu->kvm) && + !VALID_PAGE(mmu->mirror_root_hpa)) kvm_tdp_mmu_alloc_root(vcpu, true); kvm_tdp_mmu_alloc_root(vcpu, false); return 0;
On Wed, Jun 19, 2024 at 03:36:10PM -0700, Rick Edgecombe wrote: > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h > index b887c225ff24..2903f03a34be 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.h > +++ b/arch/x86/kvm/mmu/tdp_mmu.h > @@ -10,7 +10,7 @@ > void kvm_mmu_init_tdp_mmu(struct kvm *kvm); > void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); > > -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); > +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool private); > > __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) This function name is very similar to below tdp_mmu_get_root(). > +static inline struct kvm_mmu_page *tdp_mmu_get_root_for_fault(struct kvm_vcpu *vcpu, > + struct kvm_page_fault *fault) > +{ > + if (unlikely(!kvm_is_addr_direct(vcpu->kvm, fault->addr))) > + return root_to_sp(vcpu->arch.mmu->mirror_root_hpa); > + > + return root_to_sp(vcpu->arch.mmu->root.hpa); > +} > + > +static inline struct kvm_mmu_page *tdp_mmu_get_root(struct kvm_vcpu *vcpu, > + enum kvm_tdp_mmu_root_types type) > +{ > + if (unlikely(type == KVM_MIRROR_ROOTS)) > + return root_to_sp(vcpu->arch.mmu->mirror_root_hpa); > + > + return root_to_sp(vcpu->arch.mmu->root.hpa); > +} No need to put the two functions in tdp_mmu.h as they are not called in mmu.c. Could we move them to tdp_mmu.c and rename them to something like tdp_mmu_type_to_root() and tdp_mmu_fault_to_root() ?
On Thu, 2024-07-04 at 16:09 +0800, Yan Zhao wrote: > Perhaps also a comment in kvm_mmu_reload() to address concerns like why > checking > only root.hpa in kvm_mmu_reload() is enough. Sounds good, and thanks again for catching this. > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 03737f3aaeeb..aba98c8cc67d 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -129,6 +129,15 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t > gpa, const u8 *new, > > static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) > { > + /* > + * Checking root.hpa is sufficient even when KVM has mirror root. > + * We can have either: > + * (1) mirror_root_hpa = INVALID_PAGE, root.hpa = INVALID_PAGE > + * (2) mirror_root_hpa = root , root.hpa = INVALID_PAGE Looks good to me except for the space ^ > + * (3) mirror_root_hpa = root1, root.hpa = root2 > + * We don't ever have: > + * mirror_root_hpa = INVALID_PAGE, root.hpa = root > + */ > if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE)) > return 0; > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index a5f803f1d17e..eee35e958971 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3705,7 +3705,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) > int r; > > if (tdp_mmu_enabled) { > - if (kvm_has_mirrored_tdp(vcpu->kvm)) > + if (kvm_has_mirrored_tdp(vcpu->kvm) && > + !VALID_PAGE(mmu->mirror_root_hpa)) > kvm_tdp_mmu_alloc_root(vcpu, true); > kvm_tdp_mmu_alloc_root(vcpu, false); > return 0;
On Thu, 2024-07-04 at 16:51 +0800, Yan Zhao wrote: > On Wed, Jun 19, 2024 at 03:36:10PM -0700, Rick Edgecombe wrote: > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h > > index b887c225ff24..2903f03a34be 100644 > > --- a/arch/x86/kvm/mmu/tdp_mmu.h > > +++ b/arch/x86/kvm/mmu/tdp_mmu.h > > @@ -10,7 +10,7 @@ > > void kvm_mmu_init_tdp_mmu(struct kvm *kvm); > > void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); > > > > -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); > > +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool private); > > > > __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page > > *root) > This function name is very similar to below tdp_mmu_get_root(). That is true. > > > +static inline struct kvm_mmu_page *tdp_mmu_get_root_for_fault(struct > > kvm_vcpu *vcpu, > > + struct > > kvm_page_fault *fault) > > +{ > > + if (unlikely(!kvm_is_addr_direct(vcpu->kvm, fault->addr))) > > + return root_to_sp(vcpu->arch.mmu->mirror_root_hpa); > > + > > + return root_to_sp(vcpu->arch.mmu->root.hpa); > > +} > > + > > +static inline struct kvm_mmu_page *tdp_mmu_get_root(struct kvm_vcpu *vcpu, > > + enum > > kvm_tdp_mmu_root_types type) > > +{ > > + if (unlikely(type == KVM_MIRROR_ROOTS)) > > + return root_to_sp(vcpu->arch.mmu->mirror_root_hpa); > > + > > + return root_to_sp(vcpu->arch.mmu->root.hpa); > > +} > No need to put the two functions in tdp_mmu.h as they are not called in mmu.c. I think it is nice to have the root/enum logic above close together. > > Could we move them to tdp_mmu.c and rename them to something like > tdp_mmu_type_to_root() and tdp_mmu_fault_to_root() ? tdp_mmu_get_root_for_fault() was proposed by Paolo, and tdp_mmu_get_root() was discussed without comment. Not to say there is anything wrong with the names proposed, but I think this is wading into bikeshedding territory at this stage.
On Tue, 2024-07-09 at 15:38 -0700, Rick Edgecombe wrote: > > Could we move them to tdp_mmu.c and rename them to something like > > tdp_mmu_type_to_root() and tdp_mmu_fault_to_root() ? > > tdp_mmu_get_root_for_fault() was proposed by Paolo, and tdp_mmu_get_root() was > discussed without comment. Not to say there is anything wrong with the names > proposed, but I think this is wading into bikeshedding territory at this > stage. I kept thinking about this comment. On more consideration tdp_mmu_get_root() is a problematically terrible name since "get" usually means take a reference to something which is something you can do to roots and (and what kvm_tdp_mmu_get_root() is doing). So people might think tdp_mmu_get_root() is taking a reference.
On Fri, Jul 12, 2024 at 07:54:50AM +0800, Edgecombe, Rick P wrote: > On Tue, 2024-07-09 at 15:38 -0700, Rick Edgecombe wrote: > > > Could we move them to tdp_mmu.c and rename them to something like > > > tdp_mmu_type_to_root() and tdp_mmu_fault_to_root() ? > > > > tdp_mmu_get_root_for_fault() was proposed by Paolo, and tdp_mmu_get_root() was > > discussed without comment. Not to say there is anything wrong with the names > > proposed, but I think this is wading into bikeshedding territory at this > > stage. > > I kept thinking about this comment. On more consideration tdp_mmu_get_root() is > a problematically terrible name since "get" usually means take a reference to > something which is something you can do to roots and (and what > kvm_tdp_mmu_get_root() is doing). So people might think tdp_mmu_get_root() is > taking a reference. Right, I have exactly the same feeling to "get".
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6e07eff06a58..d67e88a69fc4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -460,6 +460,7 @@ struct kvm_mmu { int (*sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i); struct kvm_mmu_root_info root; + hpa_t mirror_root_hpa; union kvm_cpu_role cpu_role; union kvm_mmu_page_role root_role; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 63179a4fba7b..7b12ba761c51 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -328,4 +328,11 @@ static inline gfn_t kvm_gfn_direct_bits(const struct kvm *kvm) { return kvm->arch.gfn_direct_bits; } + +static inline bool kvm_is_addr_direct(struct kvm *kvm, gpa_t gpa) +{ + gpa_t gpa_direct_bits = gfn_to_gpa(kvm_gfn_direct_bits(kvm)); + + return !gpa_direct_bits || (gpa & gpa_direct_bits); +} #endif diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e9c1783a8743..287dcc2685e4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3701,7 +3701,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) int r; if (tdp_mmu_enabled) { - kvm_tdp_mmu_alloc_root(vcpu); + if (kvm_has_mirrored_tdp(vcpu->kvm)) + kvm_tdp_mmu_alloc_root(vcpu, true); + kvm_tdp_mmu_alloc_root(vcpu, false); return 0; } @@ -6245,6 +6247,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) mmu->root.hpa = INVALID_PAGE; mmu->root.pgd = 0; + mmu->mirror_root_hpa = INVALID_PAGE; for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID; @@ -7220,6 +7223,12 @@ int kvm_mmu_vendor_module_init(void) void kvm_mmu_destroy(struct kvm_vcpu *vcpu) { kvm_mmu_unload(vcpu); + if (tdp_mmu_enabled) { + read_lock(&vcpu->kvm->mmu_lock); + mmu_free_root_page(vcpu->kvm, &vcpu->arch.mmu->mirror_root_hpa, + NULL); + read_unlock(&vcpu->kvm->mmu_lock); + } free_mmu_pages(&vcpu->arch.root_mmu); free_mmu_pages(&vcpu->arch.guest_mmu); mmu_free_memory_caches(vcpu); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 2200bdc7681f..a0010c62425f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -95,10 +95,15 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) static bool tdp_mmu_root_match(struct kvm_mmu_page *root, enum kvm_tdp_mmu_root_types types) { + if (WARN_ON_ONCE(!(types & KVM_VALID_ROOTS))) + return false; + if (root->role.invalid) return types & KVM_INVALID_ROOTS; + if (likely(!is_mirror_sp(root))) + return types & KVM_DIRECT_ROOTS; - return true; + return types & KVM_MIRROR_ROOTS; } /* @@ -233,7 +238,7 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp, tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool mirror) { struct kvm_mmu *mmu = vcpu->arch.mmu; union kvm_mmu_page_role role = mmu->root_role; @@ -241,6 +246,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) struct kvm *kvm = vcpu->kvm; struct kvm_mmu_page *root; + if (mirror) + role.is_mirror = 1; + /* * Check for an existing root before acquiring the pages lock to avoid * unnecessary serialization if multiple vCPUs are loading a new root. @@ -292,8 +300,12 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) * and actually consuming the root if it's invalidated after dropping * mmu_lock, and the root can't be freed as this vCPU holds a reference. */ - mmu->root.hpa = __pa(root->spt); - mmu->root.pgd = 0; + if (mirror) { + mmu->mirror_root_hpa = __pa(root->spt); + } else { + mmu->root.hpa = __pa(root->spt); + mmu->root.pgd = 0; + } } static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, @@ -1113,8 +1125,8 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, */ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { + struct kvm_mmu_page *root = tdp_mmu_get_root_for_fault(vcpu, fault); struct kvm *kvm = vcpu->kvm; - struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); struct tdp_iter iter; struct kvm_mmu_page *sp; int ret = RET_PF_RETRY; @@ -1152,13 +1164,18 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) */ sp = tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_child_sp(sp, &iter); + if (is_mirror_sp(sp)) + kvm_mmu_alloc_external_spt(vcpu, sp); sp->nx_huge_page_disallowed = fault->huge_page_disallowed; - if (is_shadow_present_pte(iter.old_spte)) + if (is_shadow_present_pte(iter.old_spte)) { + /* Don't support large page for mirrored roots (TDX) */ + KVM_BUG_ON(is_mirror_sptep(iter.sptep), vcpu->kvm); r = tdp_mmu_split_huge_page(kvm, &iter, sp, true); - else + } else { r = tdp_mmu_link_sp(kvm, &iter, sp, true); + } /* * Force the guest to retry if installing an upper level SPTE @@ -1813,7 +1830,8 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *spte) { - struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); + /* Fast pf is not supported for mirrored roots */ + struct kvm_mmu_page *root = tdp_mmu_get_root(vcpu, KVM_DIRECT_ROOTS); struct tdp_iter iter; tdp_ptep_t sptep = NULL; diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index b887c225ff24..2903f03a34be 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -10,7 +10,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool private); __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) { @@ -21,11 +21,48 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root); enum kvm_tdp_mmu_root_types { KVM_INVALID_ROOTS = BIT(0), - - KVM_VALID_ROOTS = BIT(1), + KVM_DIRECT_ROOTS = BIT(1), + KVM_MIRROR_ROOTS = BIT(2), + KVM_VALID_ROOTS = KVM_DIRECT_ROOTS | KVM_MIRROR_ROOTS, KVM_ALL_ROOTS = KVM_VALID_ROOTS | KVM_INVALID_ROOTS, }; +static inline enum kvm_tdp_mmu_root_types kvm_gfn_range_filter_to_root_types(struct kvm *kvm, + enum kvm_gfn_range_filter process) +{ + enum kvm_tdp_mmu_root_types ret = 0; + + if (!kvm_has_mirrored_tdp(kvm)) + return KVM_DIRECT_ROOTS; + + if (process & KVM_FILTER_PRIVATE) + ret |= KVM_MIRROR_ROOTS; + if (process & KVM_FILTER_SHARED) + ret |= KVM_DIRECT_ROOTS; + + WARN_ON_ONCE(!ret); + + return ret; +} + +static inline struct kvm_mmu_page *tdp_mmu_get_root_for_fault(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + if (unlikely(!kvm_is_addr_direct(vcpu->kvm, fault->addr))) + return root_to_sp(vcpu->arch.mmu->mirror_root_hpa); + + return root_to_sp(vcpu->arch.mmu->root.hpa); +} + +static inline struct kvm_mmu_page *tdp_mmu_get_root(struct kvm_vcpu *vcpu, + enum kvm_tdp_mmu_root_types type) +{ + if (unlikely(type == KVM_MIRROR_ROOTS)) + return root_to_sp(vcpu->arch.mmu->mirror_root_hpa); + + return root_to_sp(vcpu->arch.mmu->root.hpa); +} + bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm);