Message ID | 20211110223010.1392399-12-bgardon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/mmu: Optimize disabling dirty logging | expand |
On 11/10/21 23:30, Ben Gardon wrote: > - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), > + WARN_ONCE(is_rsvd_spte(shadow_zero_check, spte, level), > "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, > - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); > + get_rsvd_bits(shadow_zero_check, spte, level)); Hmm, there is a deeper issue here, in that when using EPT/NPT (on either the legacy aka shadow or the TDP MMU) large parts of vcpu->arch.mmu are really the same for all vCPUs. The only thing that varies is those parts that actually depend on the guest's paging mode---the extended role, the reserved bits, etc. Those are needed by the emulator, but don't really belong in vcpu->arch.mmu when EPT/NPT is in use. I wonder if there's room for splitting kvm_mmu in two parts, such as kvm_mmu and kvm_guest_paging_context, and possibly change the walk_mmu pointer into a pointer to kvm_guest_paging_context. This way the EPT/NPT MMU (again either shadow or TDP) can be moved to kvm->arch. It should simplify this series and also David's work on eager page splitting. I'm not asking you to do this, of course, but perhaps I can trigger Sean's itch to refactor stuff. :) Paolo
On Wed, Nov 10, 2021 at 2:45 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 11/10/21 23:30, Ben Gardon wrote: > > - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), > > + WARN_ONCE(is_rsvd_spte(shadow_zero_check, spte, level), > > "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, > > - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); > > + get_rsvd_bits(shadow_zero_check, spte, level)); > > Hmm, there is a deeper issue here, in that when using EPT/NPT (on either > the legacy aka shadow or the TDP MMU) large parts of vcpu->arch.mmu are > really the same for all vCPUs. The only thing that varies is those > parts that actually depend on the guest's paging mode---the extended > role, the reserved bits, etc. Those are needed by the emulator, but > don't really belong in vcpu->arch.mmu when EPT/NPT is in use. > > I wonder if there's room for splitting kvm_mmu in two parts, such as > kvm_mmu and kvm_guest_paging_context, and possibly change the walk_mmu > pointer into a pointer to kvm_guest_paging_context. This way the > EPT/NPT MMU (again either shadow or TDP) can be moved to kvm->arch. It > should simplify this series and also David's work on eager page splitting. > > I'm not asking you to do this, of course, but perhaps I can trigger > Sean's itch to refactor stuff. :) > > Paolo > I think that's a great idea. I'm frequently confused as to why the struct kvm_mmu is a per-vcpu construct as opposed to being VM-global. Moving part of the struct to be a member for struct kvm would also open the door to formalizing the MMU interface a little better and perhaps even reveal more MMU code that can be consolidated across architectures.
On Wed, Nov 10, 2021, Ben Gardon wrote: > On Wed, Nov 10, 2021 at 2:45 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > On 11/10/21 23:30, Ben Gardon wrote: > > > - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), > > > + WARN_ONCE(is_rsvd_spte(shadow_zero_check, spte, level), > > > "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, > > > - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); > > > + get_rsvd_bits(shadow_zero_check, spte, level)); > > > > Hmm, there is a deeper issue here, in that when using EPT/NPT (on either > > the legacy aka shadow or the TDP MMU) large parts of vcpu->arch.mmu are > > really the same for all vCPUs. The only thing that varies is those > > parts that actually depend on the guest's paging mode---the extended > > role, the reserved bits, etc. Those are needed by the emulator, but > > don't really belong in vcpu->arch.mmu when EPT/NPT is in use. > > > > I wonder if there's room for splitting kvm_mmu in two parts, such as > > kvm_mmu and kvm_guest_paging_context, and possibly change the walk_mmu > > pointer into a pointer to kvm_guest_paging_context. This way the > > EPT/NPT MMU (again either shadow or TDP) can be moved to kvm->arch. It > > should simplify this series and also David's work on eager page splitting. > > > > I'm not asking you to do this, of course, but perhaps I can trigger > > Sean's itch to refactor stuff. :) > > > > Paolo > > > > I think that's a great idea. I'm frequently confused as to why the > struct kvm_mmu is a per-vcpu construct as opposed to being VM-global. > Moving part of the struct to be a member for struct kvm would also > open the door to formalizing the MMU interface a little better and > perhaps even reveal more MMU code that can be consolidated across > architectures. But what would you actually move? Even shadow_zero_check barely squeaks by, e.g. if NX is ever used to for NPT, then maybe it stops being a per-VM setting. Going through the fields... These are all related to guest context: unsigned long (*get_guest_pgd)(struct kvm_vcpu *vcpu); u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index); int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); void (*inject_page_fault)(struct kvm_vcpu *vcpu, struct x86_exception *fault); gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t gva_or_gpa, u32 access, struct x86_exception *exception); gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access, struct x86_exception *exception); int (*sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp); void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa); union kvm_mmu_role mmu_role; u8 root_level; u8 permissions[16]; u32 pkru_mask; struct rsvd_bits_validate guest_rsvd_check; u64 pdptrs[4]; gpa_t root_pgd; One field, ept_ad, can be straight deleted as it's redundant with respect to the above mmu_role.ad_disabled. u8 ept_ad; Ditto for direct_map flag (mmu_role.direct) and shadow_root_level (mmu_role.level). I haven't bothered to yank those because they have a lot of touchpoints. bool direct_map; u8 shadow_root_level; The prev_roots could be dropped if TDP roots were tracked per-VM, but we'd still want an equivalent for !TDP and nTDP MMUs. struct kvm_mmu_root_info prev_roots[KVM_MMU_NUM_PREV_ROOTS]; shadow_zero_check can be made per-VM if all vCPUs are required to have the same cpuid.MAXPHYADDR or if we remove the (IMO) pointless 5-level vs. 4-level behavior, which by-the-by, has my vote since we could make shadow_zero_check _global_, not just per-VM, and everything I've heard is that the extra level has no measurable performance overhead. struct rsvd_bits_validate shadow_zero_check; And that leaves us with: hpa_t root_hpa; u64 *pae_root; u64 *pml4_root; u64 *pml5_root; Of those, _none_ of them can be per-VM, because they are all nothing more than shadow pages, and thus cannot be per-VM unless there is exactly one set of TDP page tables for the guest. Even if/when we strip the unnecessary role bits from these for TDP (on my todo list), we still need up to three sets of page tables: 1. Normal 2. SMM 3. Guest (if L1 doesn't use TDP) So I suppose we could refactor KVM to explicitly track its three possible TDP roots, but I don't think it buys us anything and would complicate supporting !TDP as well as nTDP.
On Thu, Nov 11, 2021, Sean Christopherson wrote: > These are all related to guest context: > int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); > void (*inject_page_fault)(struct kvm_vcpu *vcpu, > struct x86_exception *fault); That's incorrect, page_fault() and inject_page_fault() could be per-VM. Neither is particularly interesting though.
On 11/11/21 02:18, Sean Christopherson wrote: > But what would you actually move? Even shadow_zero_check barely squeaks by, > e.g. if NX is ever used to for NPT, then maybe it stops being a per-VM setting. Hmm, I think it would still be per-VM, just like 32-bit shadow page tables are always built for EFER.NXE=CR4.PAE=1. Anyway, the rough sketch is to have three structs: * struct kvm_mmu_kind has the function pointers and the state that is needed to operate on page tables * struct kvm_mmu has the function pointers and the state that is needed while the vCPU runs, including the role * struct kvm_paging_context has the stuff related to emulation; shadow page tables of course needs it but EPT/NPT do not (with either the old or the new MMU) So you'd have a "struct kvm_mmu_kind direct_mmu" in struct kvm_arch (for either legacy EPT/NPT or the new MMU), and struct kvm_mmu_kind shadow_mmu; struct kvm_mmu root_mmu; /* either TDP or shadow */ struct kvm_mmu tdp12_mmu; /* always shadow */ struct kvm_mmu *mmu; /* either &kvm->direct_mmu or &vcpu->shadow_mmu */ struct kvm_paging_context root_walk; /* maybe unified with walk01 below? dunno yet */ struct kvm_paging_context walk01; struct kvm_paging_context walk12; struct kvm_paging_context *walk; /* either &vcpu->root_walk or &vcpu->walk12 */ in struct kvm_vcpu_arch. struct kvm_mmu* has a pointer to struct kvm_mmu_kind*; however, if an spte.c function does not need the data in struct kvm_mmu_state*, it can take a struct kvm_mmu_kind* and it won't need a vCPU. Likewise the TDP MMU knows its kvm_mmu_kind is always in &kvm->direct_mmu so it can take a struct kvm* if the struct kvm_mmu_state* is not needed. The first part of the refactoring would be to kill the nested_mmu and walk_mmu, replacing them by &vcpu->walk12 and vcpu->walk respectively. The half-useless nested_mmu has always bothered me, I was going to play with it anyway because I want to remove the kvm_mmu_reset_context from CR0.WP writes, I'll see if I get something useful out of it. Paolo
On Wed, Nov 10, 2021, Ben Gardon wrote: > In the interest of devloping a version of make_spte that can function > without a vCPU pointer, factor out the shadow_zero_mask to be an > additional argument to the function. > > No functional change intended. > > > Signed-off-by: Ben Gardon <bgardon@google.com> > --- > arch/x86/kvm/mmu/spte.c | 11 +++++++---- > arch/x86/kvm/mmu/spte.h | 3 ++- > 2 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c > index b7271daa06c5..d3b059e96c6e 100644 > --- a/arch/x86/kvm/mmu/spte.c > +++ b/arch/x86/kvm/mmu/spte.c > @@ -93,7 +93,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > struct kvm_memory_slot *slot, unsigned int pte_access, > gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, > bool can_unsync, bool host_writable, bool ad_need_write_protect, > - u64 mt_mask, u64 *new_spte) > + u64 mt_mask, struct rsvd_bits_validate *shadow_zero_check, Ugh, so I had a big email written about how I think we should add a module param to control 4-level vs. 5-level for all TDP pages, but then I realized it wouldn't work for nested EPT because that follows the root level used by L1. We could still make a global non_nested_tdp_shadow_zero_check or whatever, but then make_spte() would have to do some work to find the right rsvd_bits_validate, and the end result would likely be a mess. One idea to avoid exploding make_spte() would be to add a backpointer to the MMU in kvm_mmu_page. I don't love the idea, but I also don't love passing in rsvd_bits_validate.
On Thu, Nov 18, 2021, Sean Christopherson wrote: > On Wed, Nov 10, 2021, Ben Gardon wrote: > > In the interest of devloping a version of make_spte that can function > > without a vCPU pointer, factor out the shadow_zero_mask to be an > > additional argument to the function. > > > > No functional change intended. > > > > > > Signed-off-by: Ben Gardon <bgardon@google.com> > > --- > > arch/x86/kvm/mmu/spte.c | 11 +++++++---- > > arch/x86/kvm/mmu/spte.h | 3 ++- > > 2 files changed, 9 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c > > index b7271daa06c5..d3b059e96c6e 100644 > > --- a/arch/x86/kvm/mmu/spte.c > > +++ b/arch/x86/kvm/mmu/spte.c > > @@ -93,7 +93,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > > struct kvm_memory_slot *slot, unsigned int pte_access, > > gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, > > bool can_unsync, bool host_writable, bool ad_need_write_protect, > > - u64 mt_mask, u64 *new_spte) > > + u64 mt_mask, struct rsvd_bits_validate *shadow_zero_check, > > Ugh, so I had a big email written about how I think we should add a module param > to control 4-level vs. 5-level for all TDP pages, but then I realized it wouldn't > work for nested EPT because that follows the root level used by L1. We could > still make a global non_nested_tdp_shadow_zero_check or whatever, but then make_spte() > would have to do some work to find the right rsvd_bits_validate, and the end result > would likely be a mess. > > One idea to avoid exploding make_spte() would be to add a backpointer to the MMU > in kvm_mmu_page. I don't love the idea, but I also don't love passing in rsvd_bits_validate. Another idea. The only difference between 5-level and 4-level is that 5-level fills in index [4], and I'm pretty sure 4-level doesn't touch that index. For PAE NPT (32-bit SVM), the shadow root level will never change, so that's not an issue. Nested NPT is the only case where anything for an EPT/NPT MMU can change, because that follows EFER.NX. In other words, the non-nested TDP reserved bits don't need to be recalculated regardless of level, they can just fill in 5-level and leave it be. E.g. something like the below. The sp->role.direct check could be removed if we forced EFER.NX for nested NPT. It's a bit ugly in that we'd pass both @kvm and @vcpu, so that needs some more thought, but at minimum it means there's no need to recalc the reserved bits. diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 84e64dbdd89e..05db9b89dc53 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -95,10 +95,18 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 old_spte, bool prefetch, bool can_unsync, bool host_writable, u64 *new_spte) { + struct rsvd_bits_validate *rsvd_check; int level = sp->role.level; u64 spte = SPTE_MMU_PRESENT_MASK; bool wrprot = false; + if (vcpu) { + rsvd_check = vcpu->arch.mmu->shadow_zero_check; + } else { + WARN_ON_ONCE(!tdp_enabled || !sp->role.direct); + rsvd_check = tdp_shadow_rsvd_bits; + } + if (sp->role.ad_disabled) spte |= SPTE_TDP_AD_DISABLED_MASK; else if (kvm_mmu_page_ad_need_write_protect(sp)) @@ -177,9 +185,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (prefetch) spte = mark_spte_for_access_track(spte); - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), + WARN_ONCE(is_rsvd_spte(rsvd_check, spte, level), "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); + get_rsvd_bits(rsvd_check, spte, level)); if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */
On Thu, Nov 18, 2021, Sean Christopherson wrote: > Another idea. The only difference between 5-level and 4-level is that 5-level > fills in index [4], and I'm pretty sure 4-level doesn't touch that index. For > PAE NPT (32-bit SVM), the shadow root level will never change, so that's not an issue. > > Nested NPT is the only case where anything for an EPT/NPT MMU can change, because > that follows EFER.NX. > > In other words, the non-nested TDP reserved bits don't need to be recalculated > regardless of level, they can just fill in 5-level and leave it be. > > E.g. something like the below. The sp->role.direct check could be removed if we > forced EFER.NX for nested NPT. > > It's a bit ugly in that we'd pass both @kvm and @vcpu, so that needs some more > thought, but at minimum it means there's no need to recalc the reserved bits. Ok, I think my final vote is to have the reserved bits passed in, but with the non-nested TDP reserved bits being computed at MMU init. I would also prefer to keep the existing make_spte() name so that there's no churn in those call sites, and to make the relationship between the wrapper, mask_spte(), and the "real" helper, __make_spte(), more obvious and aligned with the usual kernel style. So with the kvm_vcpu_ad_need_write_protect() change and my proposed hack-a-fix for kvm_x86_get_mt_mask(), the end result would look like: bool __make_spte(struct kvm *kvm, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, bool host_writable, u64 *new_spte, struct rsvd_bits_validate *shadow_rsvd_bits) { int level = sp->role.level; u64 spte = SPTE_MMU_PRESENT_MASK; bool wrprot = false; if (sp->role.ad_disabled) spte |= SPTE_TDP_AD_DISABLED_MASK; else if (kvm_mmu_page_ad_need_write_protect(sp)) spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK; /* * For the EPT case, shadow_present_mask is 0 if hardware * supports exec-only page table entries. In that case, * ACC_USER_MASK and shadow_user_mask are used to represent * read access. See FNAME(gpte_access) in paging_tmpl.h. */ spte |= shadow_present_mask; if (!prefetch) spte |= spte_shadow_accessed_mask(spte); if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) && is_nx_huge_page_enabled()) { pte_access &= ~ACC_EXEC_MASK; } if (pte_access & ACC_EXEC_MASK) spte |= shadow_x_mask; else spte |= shadow_nx_mask; if (pte_access & ACC_USER_MASK) spte |= shadow_user_mask; if (level > PG_LEVEL_4K) spte |= PT_PAGE_SIZE_MASK; if (tdp_enabled) spte |= static_call(kvm_x86_get_mt_mask)(kvm, gfn, kvm_is_mmio_pfn(pfn)); if (host_writable) spte |= shadow_host_writable_mask; else pte_access &= ~ACC_WRITE_MASK; if (!kvm_is_mmio_pfn(pfn)) spte |= shadow_me_mask; spte |= (u64)pfn << PAGE_SHIFT; if (pte_access & ACC_WRITE_MASK) { spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask; /* * Optimization: for pte sync, if spte was writable the hash * lookup is unnecessary (and expensive). Write protection * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots. * Same reasoning can be applied to dirty page accounting. */ if (is_writable_pte(old_spte)) goto out; /* * Unsync shadow pages that are reachable by the new, writable * SPTE. Write-protect the SPTE if the page can't be unsync'd, * e.g. it's write-tracked (upper-level SPs) or has one or more * shadow pages and unsync'ing pages is not allowed. */ if (mmu_try_to_unsync_pages(kvm, slot, gfn, can_unsync, prefetch)) { pgprintk("%s: found shadow page for %llx, marking ro\n", __func__, gfn); wrprot = true; pte_access &= ~ACC_WRITE_MASK; spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask); } } if (pte_access & ACC_WRITE_MASK) spte |= spte_shadow_dirty_mask(spte); out: if (prefetch) spte = mark_spte_for_access_track(spte); WARN_ONCE(is_rsvd_spte(shadow_rsvd_bits), spte, level), "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, get_rsvd_bits(&shadow_rsvd_bits, spte, level)); if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */ WARN_ON(level > PG_LEVEL_4K); mark_page_dirty_in_slot(kvm, slot, gfn); } *new_spte = spte; return wrprot; } bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, bool host_writable, u64 *new_spte) { return __make_spte(vcpu->kvm, sp, slot, pte_access, gfn, pfn, old_spte, prefetch, can_unsync, host_writable, new_spte, &vcpu->arch.mmu->shadow_zero_check); }
On 11/18/21 17:37, Sean Christopherson wrote: >> It's a bit ugly in that we'd pass both @kvm and @vcpu, so that needs some more >> thought, but at minimum it means there's no need to recalc the reserved bits. > > Ok, I think my final vote is to have the reserved bits passed in, but with the > non-nested TDP reserved bits being computed at MMU init. Yes, and that's also where I was getting with the idea of moving part of the "direct" MMU (man, naming these things is so hard) to struct kvm: split the per-vCPU state from the constant one and initialize the latter just once. Though perhaps I was putting the cart slightly before the horse. On the topic of naming, we have a lot of things to name: - the two MMU codebases: you Googlers are trying to grandfather "legacy" and "TDP" into upstream, but that's not a great name because the former is used also when shadowing EPT/NPT. I'm thinking of standardizing on "shadow" and "TDP" (it's not perfect because of the 32-bit and tdp_mmu=0 cases, but it's a start). Maybe even split parts of mmu.c out into shadow_mmu.c. - the two walkers (I'm quite convinced of splitting that part out of struct kvm_mmu and getting rid of walk_mmu/nested_mmu): that's easy, it can be walk01 and walk12 with "walk" pointing to one of them - the two MMUs: with nested_mmu gone, root_mmu and guest_mmu are much less confusing and we can keep those names. Paolo
On Thu, Nov 18, 2021, Paolo Bonzini wrote: > On 11/18/21 17:37, Sean Christopherson wrote: > > > It's a bit ugly in that we'd pass both @kvm and @vcpu, so that needs some more > > > thought, but at minimum it means there's no need to recalc the reserved bits. > > > > Ok, I think my final vote is to have the reserved bits passed in, but with the > > non-nested TDP reserved bits being computed at MMU init. > > Yes, and that's also where I was getting with the idea of moving part of the > "direct" MMU (man, naming these things is so hard) to struct kvm: split the > per-vCPU state from the constant one and initialize the latter just once. > Though perhaps I was putting the cart slightly before the horse. > > On the topic of naming, we have a lot of things to name: > > - the two MMU codebases: you Googlers are trying to grandfather "legacy" and > "TDP" into upstream Heh, I think that's like 99.9% me. > but that's not a great name because the former is used also when shadowing > EPT/NPT. I'm thinking of standardizing on "shadow" and "TDP" (it's not > perfect because of the 32-bit and tdp_mmu=0 cases, but it's a start). Maybe > even split parts of mmu.c out into shadow_mmu.c. But shadow is flat out wrong until EPT and NPT support is ripped out of the "legacy" MMU. > - the two walkers (I'm quite convinced of splitting that part out of struct > kvm_mmu and getting rid of walk_mmu/nested_mmu): that's easy, it can be > walk01 and walk12 with "walk" pointing to one of them I am all in favor of walk01 and walk12, the guest_mmu vs. nested_mmu confusion is painful. > - the two MMUs: with nested_mmu gone, root_mmu and guest_mmu are much less > confusing and we can keep those names. I would prefer root_mmu and nested_tdp_mmu. guest_mmu is misleading because its not used for all cases of sp->role.guest_mode=1, i.e. when L1 is not using TDP then guest_mode=1 but KVM isn't using guest_mmu.
On 11/18/21 19:02, Sean Christopherson wrote: >> but that's not a great name because the former is used also when shadowing >> EPT/NPT. I'm thinking of standardizing on "shadow" and "TDP" (it's not >> perfect because of the 32-bit and tdp_mmu=0 cases, but it's a start). Maybe >> even split parts of mmu.c out into shadow_mmu.c. > But shadow is flat out wrong until EPT and NPT support is ripped out of the "legacy" > MMU. Yeah, that's true. "full" MMU? :) >> - the two walkers (I'm quite convinced of splitting that part out of struct >> kvm_mmu and getting rid of walk_mmu/nested_mmu): that's easy, it can be >> walk01 and walk12 with "walk" pointing to one of them > > I am all in favor of walk01 and walk12, the guest_mmu vs. nested_mmu confusion > is painful. > >> - the two MMUs: with nested_mmu gone, root_mmu and guest_mmu are much less >> confusing and we can keep those names. > > I would prefer root_mmu and nested_tdp_mmu. guest_mmu is misleading because its > not used for all cases of sp->role.guest_mode=1, i.e. when L1 is not using TDP > then guest_mode=1 but KVM isn't using guest_mmu. Ok, that sounds good too. Paolo
On Thu, Nov 18, 2021, Paolo Bonzini wrote: > On 11/18/21 19:02, Sean Christopherson wrote: > > > but that's not a great name because the former is used also when shadowing > > > EPT/NPT. I'm thinking of standardizing on "shadow" and "TDP" (it's not > > > perfect because of the 32-bit and tdp_mmu=0 cases, but it's a start). Maybe > > > even split parts of mmu.c out into shadow_mmu.c. > > But shadow is flat out wrong until EPT and NPT support is ripped out of the "legacy" > > MMU. > > Yeah, that's true. "full" MMU? :) Or we could just rip out non-nested TDP support from the legacy MMU and call it the shadow MMU :-)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index b7271daa06c5..d3b059e96c6e 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -93,7 +93,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, bool host_writable, bool ad_need_write_protect, - u64 mt_mask, u64 *new_spte) + u64 mt_mask, struct rsvd_bits_validate *shadow_zero_check, + u64 *new_spte) { int level = sp->role.level; u64 spte = SPTE_MMU_PRESENT_MASK; @@ -176,9 +177,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (prefetch) spte = mark_spte_for_access_track(spte); - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), + WARN_ONCE(is_rsvd_spte(shadow_zero_check, spte, level), "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); + get_rsvd_bits(shadow_zero_check, spte, level)); if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */ @@ -198,10 +199,12 @@ bool vcpu_make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, bool ad_need_write_protect = kvm_vcpu_ad_need_write_protect(vcpu); u64 mt_mask = static_call(kvm_x86_get_mt_mask)(vcpu, gfn, kvm_is_mmio_pfn(pfn)); + struct rsvd_bits_validate *shadow_zero_check = &vcpu->arch.mmu->shadow_zero_check; return make_spte(vcpu, sp, slot, pte_access, gfn, pfn, old_spte, prefetch, can_unsync, host_writable, - ad_need_write_protect, mt_mask, new_spte); + ad_need_write_protect, mt_mask, shadow_zero_check, + new_spte); } diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index e739f2ebf844..6134a10487c4 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -333,7 +333,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, bool host_writable, bool ad_need_write_protect, - u64 mt_mask, u64 *new_spte); + u64 mt_mask, struct rsvd_bits_validate *shadow_zero_check, + u64 *new_spte); bool vcpu_make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
In the interest of devloping a version of make_spte that can function without a vCPU pointer, factor out the shadow_zero_mask to be an additional argument to the function. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> --- arch/x86/kvm/mmu/spte.c | 11 +++++++---- arch/x86/kvm/mmu/spte.h | 3 ++- 2 files changed, 9 insertions(+), 5 deletions(-)