Message ID | 20221208193857.4090582-2-dmatlack@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Refactor the KVM/x86 TDP MMU into common code | expand |
On 12/9/2022 3:38 AM, David Matlack wrote: > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it > directly as the address space ID throughout the KVM MMU code. This > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and > prepares for making kvm_mmu_page_role architecture-neutral. > > Signed-off-by: David Matlack <dmatlack@google.com> > --- > arch/x86/include/asm/kvm_host.h | 4 ++-- > arch/x86/kvm/mmu/mmu.c | 6 +++--- > arch/x86/kvm/mmu/mmu_internal.h | 10 ---------- > arch/x86/kvm/mmu/tdp_iter.c | 2 +- > arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++------ > 5 files changed, 12 insertions(+), 22 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index aa4eb8cfcd7e..0a819d40131a 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -348,7 +348,7 @@ union kvm_mmu_page_role { > * simple shift. While there is room, give it a whole > * byte so it is also faster to load it from memory. > */ > - unsigned smm:8; > + unsigned as_id:8; > }; > }; > > @@ -2056,7 +2056,7 @@ enum { > # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE > # define KVM_ADDRESS_SPACE_NUM 2 > # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm) > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id) > #else > # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0) > #endif > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 4d188f056933..f375b719f565 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs) > union kvm_cpu_role role = {0}; > > role.base.access = ACC_ALL; > - role.base.smm = is_smm(vcpu); > + role.base.as_id = is_smm(vcpu); I'm not familiar with other architectures, is there similar conception as x86 smm mode? If not, maybe need to re-shape is_smm() as a common helper to get the as_id. [...]
On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote: > > On 12/9/2022 3:38 AM, David Matlack wrote: > > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it > > directly as the address space ID throughout the KVM MMU code. This > > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and > > prepares for making kvm_mmu_page_role architecture-neutral. > > > > Signed-off-by: David Matlack <dmatlack@google.com> > > --- > > arch/x86/include/asm/kvm_host.h | 4 ++-- > > arch/x86/kvm/mmu/mmu.c | 6 +++--- > > arch/x86/kvm/mmu/mmu_internal.h | 10 ---------- > > arch/x86/kvm/mmu/tdp_iter.c | 2 +- > > arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++------ > > 5 files changed, 12 insertions(+), 22 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index aa4eb8cfcd7e..0a819d40131a 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -348,7 +348,7 @@ union kvm_mmu_page_role { > > * simple shift. While there is room, give it a whole > > * byte so it is also faster to load it from memory. > > */ > > - unsigned smm:8; > > + unsigned as_id:8; > > }; > > }; > > @@ -2056,7 +2056,7 @@ enum { > > # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE > > # define KVM_ADDRESS_SPACE_NUM 2 > > # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) > > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm) > > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id) > > #else > > # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0) > > #endif > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 4d188f056933..f375b719f565 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs) > > union kvm_cpu_role role = {0}; > > role.base.access = ACC_ALL; > > - role.base.smm = is_smm(vcpu); > > + role.base.as_id = is_smm(vcpu); > > I'm not familiar with other architectures, is there similar conception as > x86 smm mode? For KVM/arm64: No, we don't do anything like SMM emulation on x86. Architecturally speaking, though, we do have a higher level of privilege typically used by firmware on arm64, called EL3. I'll need to read David's series a bit more closely, but I'm inclined to think that the page role is going to be rather arch-specific. -- Thanks, Oliver
On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote: > > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote: > > > > On 12/9/2022 3:38 AM, David Matlack wrote: > > > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it > > > directly as the address space ID throughout the KVM MMU code. This > > > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and > > > prepares for making kvm_mmu_page_role architecture-neutral. > > > > > > Signed-off-by: David Matlack <dmatlack@google.com> > > > --- > > > arch/x86/include/asm/kvm_host.h | 4 ++-- > > > arch/x86/kvm/mmu/mmu.c | 6 +++--- > > > arch/x86/kvm/mmu/mmu_internal.h | 10 ---------- > > > arch/x86/kvm/mmu/tdp_iter.c | 2 +- > > > arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++------ > > > 5 files changed, 12 insertions(+), 22 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > index aa4eb8cfcd7e..0a819d40131a 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -348,7 +348,7 @@ union kvm_mmu_page_role { > > > * simple shift. While there is room, give it a whole > > > * byte so it is also faster to load it from memory. > > > */ > > > - unsigned smm:8; > > > + unsigned as_id:8; > > > }; > > > }; > > > @@ -2056,7 +2056,7 @@ enum { > > > # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE > > > # define KVM_ADDRESS_SPACE_NUM 2 > > > # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) > > > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm) > > > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id) > > > #else > > > # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0) > > > #endif > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 4d188f056933..f375b719f565 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs) > > > union kvm_cpu_role role = {0}; > > > role.base.access = ACC_ALL; > > > - role.base.smm = is_smm(vcpu); > > > + role.base.as_id = is_smm(vcpu); > > > > I'm not familiar with other architectures, is there similar conception as > > x86 smm mode? The notion of address spaces is already existing architecture-neutral concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of. Architectures that do not use multiple address spaces will just leave as_id is as always 0. > > For KVM/arm64: > > No, we don't do anything like SMM emulation on x86. Architecturally > speaking, though, we do have a higher level of privilege typically > used by firmware on arm64, called EL3. > > I'll need to read David's series a bit more closely, but I'm inclined to > think that the page role is going to be rather arch-specific. Yes most of the fields are in the arch-specific sub-role. The TDP MMU only needs to know about the as_id, level, and invalid bits. (See next patch.)
On Fri, Dec 09, 2022, David Matlack wrote: > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote: > > > > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote: > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > index 4d188f056933..f375b719f565 100644 > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs) > > > > union kvm_cpu_role role = {0}; > > > > role.base.access = ACC_ALL; > > > > - role.base.smm = is_smm(vcpu); > > > > + role.base.as_id = is_smm(vcpu); > > > > > > I'm not familiar with other architectures, is there similar conception as > > > x86 smm mode? > > The notion of address spaces is already existing architecture-neutral > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of. Yes, SMM is currently the only use-case. > Architectures that do not use multiple address spaces will just leave > as_id is as always 0. My preference would be to leave .smm in x86's page role. IMO, defining multiple address spaces to support SMM emulation was a mistake that should be contained to SMM, i.e. should never be used for any other feature. And with CONFIG_KVM_SMM, even x86 can opt out. For all potential use cases I'm aware of, SMM included, separate address spaces are overkill. The SMM use case is to define a region of guest memory that is accessible if and only if the vCPU is operating in SMM. Emulating something like TrustZone or EL3 would be quite similar. Ditto for Intel's TXT Private Space (though I can't imagine KVM ever emulating TXT :-) ). Using separate address spaces means that userspace needs to define the overlapping GPA areas multiple times, which is inefficient for both memory and CPU usage. E.g. for SMM, userspace needs to redefine all of "regular" memory for SMM in addition to memory that is SMM-only. And more bizarelly, nothing prevents userspace from defining completely different memslot layouts for each address space, which might may not add complexity in terms of code, but does make it more difficult to reason about KVM behavior at the boundaries between modes. Unless I'm missing something, e.g. a need to map GPAs differently for SMM vs. non-SMM, SMM could have been implemented with a simple flag in a memslot to mark the memslot as SMM-only. Or likely even better, as an overlay to track attributes, e.g. similar to how private vs. shared memory will be handled for protected VMs. That would be slightly less efficient for memslot searches for use cases where all memory is mutually exclusive, but simpler and more efficient overall. And separate address spaces become truly nasty if the CPU can access multiple protected regions, e.g. if the CPU can access type X and type Y at the same time, then there would need to be memslots for "regular", X, Y, and X+Y.
On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote: > On Fri, Dec 09, 2022, David Matlack wrote: > > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote: > > > > > > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote: > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > > index 4d188f056933..f375b719f565 100644 > > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs) > > > > > union kvm_cpu_role role = {0}; > > > > > role.base.access = ACC_ALL; > > > > > - role.base.smm = is_smm(vcpu); > > > > > + role.base.as_id = is_smm(vcpu); > > > > > > > > I'm not familiar with other architectures, is there similar conception as > > > > x86 smm mode? > > > > The notion of address spaces is already existing architecture-neutral > > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in > > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of. > > Yes, SMM is currently the only use-case. > > > Architectures that do not use multiple address spaces will just leave > > as_id is as always 0. > > My preference would be to leave .smm in x86's page role. IMO, defining multiple > address spaces to support SMM emulation was a mistake that should be contained to > SMM, i.e. should never be used for any other feature. And with CONFIG_KVM_SMM, > even x86 can opt out. +1 I don't think something is architecture-neutral by virtue of it existing in virt/kvm/*. > Emulating something like TrustZone or EL3 would be quite similar. /me shudders I know it is for the sake of discussion, but for posterity: please no!
On 12/12/22 18:39, Sean Christopherson wrote: >> The notion of address spaces is already existing architecture-neutral >> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in >> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of. > > Yes, SMM is currently the only use-case. It's possible that in the future Hyper-V VTLs will also have per-level protections. It wouldn't use as_id, but it would likely be recorded in the upper byte of the role. I'm not sure if Microsoft intends to port those to ARM as well. > My preference would be to leave .smm in x86's page role What about defining a byte of arch_role and a macro to build it? > Unless I'm missing something, e.g. a need to map GPAs differently for > SMM vs. non-SMM, SMM could have been implemented with a simple flag > in a memslot to mark the memslot as SMM-only. Unfortunately not, because there can be another region (for example video RAM at 0A0000h) underneath SMRAM. In fact, KVM_MEM_X86_SMRAM was the first idea. It was uglier than multiple address spaces (https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com). In retrospect there were probably ways to mix the best of the two designs, but it wasn't generic enough. > And separate address spaces become truly nasty if the CPU can access multiple > protected regions, e.g. if the CPU can access type X and type Y at the same time, > then there would need to be memslots for "regular", X, Y, and X+Y. Without a usecase that's hard to say. It's just as possible that there would be a natural hierarchy of levels. Paolo
On Mon, Dec 12, 2022 at 06:17:36PM +0000, Oliver Upton wrote: > On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote: > > On Fri, Dec 09, 2022, David Matlack wrote: > > > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote: > > My preference would be to leave .smm in x86's page role. IMO, defining multiple > > address spaces to support SMM emulation was a mistake that should be contained to > > SMM, i.e. should never be used for any other feature. And with CONFIG_KVM_SMM, > > even x86 can opt out. > > +1 > > I don't think something is architecture-neutral by virtue of it existing > in virt/kvm/*. Put another way, just because something exists in virt/kvm/* doesn't mean it is used (or will be useful) to more than one architecture. Totally agree. In this case, there never turned out to be any other usecases for memslot address spaces. As for role.arch.smm vs role.as_id, I'll post my response on the other thread with Paolo. Juggling these threads is hard.
On Mon, Dec 12, 2022 at 11:50:29PM +0100, Paolo Bonzini wrote: > On 12/12/22 18:39, Sean Christopherson wrote: > > > The notion of address spaces is already existing architecture-neutral > > > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in > > > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of. > > > > Yes, SMM is currently the only use-case. > > It's possible that in the future Hyper-V VTLs will also have per-level > protections. It wouldn't use as_id, but it would likely be recorded in the > upper byte of the role. > > I'm not sure if Microsoft intends to port those to ARM as well. > > > My preference would be to leave .smm in x86's page role > > What about defining a byte of arch_role and a macro to build it? Both would work. I went with as_id in the common role since that's how it's encoded in kvm_memory_slot and because, not matter what, the TDP MMU still has to handle multiple address spaces. i.e. Even if we hide SMM away in the role, the TDP MMU still has to access it with some wrapper e.g. kvm_mmu_page_as_id() (that would just return 0 outside of x86). From that perspective, just having as_id directly in the common role seemed like the cleanest option. The only way to truly shield the TDP MMU from SMM would be to disallow it. e.g. Disable the TDP MMU if defined(CONFIG_KVM_SMM), or something similar. But I don't know enough about how KVM SMM support is used to say if that's even worth entertaining.
On Mon, Dec 12, 2022, Paolo Bonzini wrote: > On 12/12/22 18:39, Sean Christopherson wrote: > > My preference would be to leave .smm in x86's page role > > What about defining a byte of arch_role and a macro to build it? I think David already carved out a big chunk for arch role bits, my objection was purely to making as_id a generic 8-bit role. > > Unless I'm missing something, e.g. a need to map GPAs differently for > > SMM vs. non-SMM, SMM could have been implemented with a simple flag > > in a memslot to mark the memslot as SMM-only. > > Unfortunately not, because there can be another region (for example video > RAM at 0A0000h) underneath SMRAM. Ugh, it's even a very explicitly documented "feature". When compatible SMM space is enabled, SMM-mode CBO accesses to this range route to physical system DRAM at 00_000A_0h – 00_000B_FFFFh. Non-SMM mode CBO accesses to this range are considered to be to the Video Buffer Area as described above. PCI Express* and DMI originated cycles to SMM space are not supported and are considered to be to the Video Buffer Area. I also forgot KVM supports SMBASE relocation :-( > In fact, KVM_MEM_X86_SMRAM was the first idea. It was uglier than multiple > address spaces (https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com). > In retrospect there were probably ways to mix the best of the two designs, > but it wasn't generic enough. > > > And separate address spaces become truly nasty if the CPU can access multiple > > protected regions, e.g. if the CPU can access type X and type Y at the same time, > > then there would need to be memslots for "regular", X, Y, and X+Y. > > Without a usecase that's hard to say. It's just as possible that there > would be a natural hierarchy of levels. Ah, true.
On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote: > > My preference would be to leave .smm in x86's page role. IMO, defining multiple > address spaces to support SMM emulation was a mistake that should be contained to > SMM, i.e. should never be used for any other feature. And with CONFIG_KVM_SMM, > even x86 can opt out. > I think the name ASID in kvm/x86 should be used for vmcb's ASID, vmcs's VPID, and PCID. Using the name ASID for other purposes would only result in unnecessary confusion. There is a bug for shadow paging when it uses two separate sets of memslots which are using two sets of rmap and page-tracking. When SMM world is writing to a non-SMM page which happens to be a guest pagetable in the non-SMM world, the write operation will go smoothly without specially handled and the shadow page for the guest pagetable is neither unshadowed nor marked unsync. The shadow paging code is unaware that the shadow page has deviated from the guest pagetable. It means when SMM is enabled, shadow paging should be disabled, which also means it has to use tdp and not to use nested tdp. Thanks Lai
On Wed, Dec 14, 2022, Lai Jiangshan wrote: > On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > My preference would be to leave .smm in x86's page role. IMO, defining multiple > > address spaces to support SMM emulation was a mistake that should be contained to > > SMM, i.e. should never be used for any other feature. And with CONFIG_KVM_SMM, > > even x86 can opt out. > > > > > I think the name ASID in kvm/x86 should be used for vmcb's ASID, > vmcs's VPID, and PCID. Using the name ASID for other purposes > would only result in unnecessary confusion. I agree in principle, but at this point fixing the problem would require a lot of churn since "as_id" is pervasive throughout the memslots code. It should be possible though, as I don't think anything in KVM's uAPI explicitly takes an as_id, i.e. it's KVM-internal terminology for the most part. > There is a bug for shadow paging when it uses two separate sets > of memslots which are using two sets of rmap and page-tracking. > > When SMM world is writing to a non-SMM page which happens to be > a guest pagetable in the non-SMM world, the write operation will > go smoothly without specially handled and the shadow page for the guest > pagetable is neither unshadowed nor marked unsync. The shadow paging > code is unaware that the shadow page has deviated from the guest > pagetable. Won't the unsync code work as intended? for_each_gfn_valid_sp_with_gptes() doesn't consume the slot and I don't see any explicit filtering on role.smm, i.e. should unsync all SPTEs for the gfn. Addressing the page-track case is a bit gross, but doesn't appear to be too difficult. I wouldn't be surprised if there are other SMM => non-SMM bugs lurking though. --- arch/x86/include/asm/kvm_page_track.h | 2 +- arch/x86/kvm/mmu/mmu.c | 7 +++--- arch/x86/kvm/mmu/mmu_internal.h | 3 ++- arch/x86/kvm/mmu/page_track.c | 32 +++++++++++++++++++-------- arch/x86/kvm/mmu/spte.c | 2 +- 5 files changed, 31 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index eb186bc57f6a..fdd9de31e160 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm, void kvm_slot_page_track_remove_page(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, enum kvm_page_track_mode mode); -bool kvm_slot_page_track_is_active(struct kvm *kvm, +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, gfn_t gfn, enum kvm_page_track_mode mode); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 254bc46234e0..1c2200042133 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must * be write-protected. */ -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, gfn_t gfn, bool can_unsync, bool prefetch) { + struct kvm *kvm = vcpu->kvm; struct kvm_mmu_page *sp; bool locked = false; @@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, * track machinery is used to write-protect upper-level shadow pages, * i.e. this guards the role.level == 4K assertion below! */ - if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE)) + if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE)) return -EPERM; /* @@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, * guest is writing the page which is write tracked which can * not be fixed by page fault handler. */ - if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE)) + if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE)) return true; return false; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index ac00bfbf32f6..38040ab27986 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode; } -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, + const struct kvm_memory_slot *slot, gfn_t gfn, bool can_unsync, bool prefetch); void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn); diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 2e09d1b6249f..0e9bc837257e 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -18,6 +18,7 @@ #include "mmu.h" #include "mmu_internal.h" +#include "smm.h" bool kvm_page_track_write_tracking_enabled(struct kvm *kvm) { @@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm, } EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page); +static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot, + gfn_t gfn, enum kvm_page_track_mode mode) +{ + int index; + + if (!slot) + return false; + + index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K); + return !!READ_ONCE(slot->arch.gfn_track[mode][index]); +} + /* * check if the corresponding access on the specified guest page is tracked. */ -bool kvm_slot_page_track_is_active(struct kvm *kvm, +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, gfn_t gfn, enum kvm_page_track_mode mode) { - int index; - if (WARN_ON(!page_track_mode_is_valid(mode))) return false; - if (!slot) - return false; - if (mode == KVM_PAGE_TRACK_WRITE && - !kvm_page_track_write_tracking_enabled(kvm)) + !kvm_page_track_write_tracking_enabled(vcpu->kvm)) return false; - index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K); - return !!READ_ONCE(slot->arch.gfn_track[mode][index]); + if (__kvm_slot_page_track_is_active(slot, gfn, mode)) + return true; + + if (!is_smm(vcpu)) + return false; + + return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn), + gfn, mode); } void kvm_page_track_cleanup(struct kvm *kvm) diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index c0fd7e049b4e..89ddd113c1b9 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * e.g. it's write-tracked (upper-level SPs) or has one or more * shadow pages and unsync'ing pages is not allowed. */ - if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) { + if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) { pgprintk("%s: found shadow page for %llx, marking ro\n", __func__, gfn); wrprot = true; base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
On Thu, Dec 15, 2022 at 3:42 AM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Dec 14, 2022, Lai Jiangshan wrote: > > On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > > > > My preference would be to leave .smm in x86's page role. IMO, defining multiple > > > address spaces to support SMM emulation was a mistake that should be contained to > > > SMM, i.e. should never be used for any other feature. And with CONFIG_KVM_SMM, > > > even x86 can opt out. > > > > > > > > > I think the name ASID in kvm/x86 should be used for vmcb's ASID, > > vmcs's VPID, and PCID. Using the name ASID for other purposes > > would only result in unnecessary confusion. > > I agree in principle, but at this point fixing the problem would require a lot of > churn since "as_id" is pervasive throughout the memslots code. > > It should be possible though, as I don't think anything in KVM's uAPI explicitly > takes an as_id, i.e. it's KVM-internal terminology for the most part. > > > There is a bug for shadow paging when it uses two separate sets > > of memslots which are using two sets of rmap and page-tracking. > > > > When SMM world is writing to a non-SMM page which happens to be > > a guest pagetable in the non-SMM world, the write operation will > > go smoothly without specially handled and the shadow page for the guest > > pagetable is neither unshadowed nor marked unsync. The shadow paging > > code is unaware that the shadow page has deviated from the guest > > pagetable. > > Won't the unsync code work as intended? for_each_gfn_valid_sp_with_gptes() > doesn't consume the slot and I don't see any explicit filtering on role.smm, > i.e. should unsync all SPTEs for the gfn. > > Addressing the page-track case is a bit gross, but doesn't appear to be too > difficult. I wouldn't be surprised if there are other SMM => non-SMM bugs lurking > though. > > --- > arch/x86/include/asm/kvm_page_track.h | 2 +- > arch/x86/kvm/mmu/mmu.c | 7 +++--- > arch/x86/kvm/mmu/mmu_internal.h | 3 ++- > arch/x86/kvm/mmu/page_track.c | 32 +++++++++++++++++++-------- > arch/x86/kvm/mmu/spte.c | 2 +- > 5 files changed, 31 insertions(+), 15 deletions(-) Could you send the patch in a new thread, please? I will add my reviewed-by then. It still lacks the parts that do write protection for sp->gfn. kvm_vcpu_write_protect_gfn() has to handle the two worlds. account_shadowed() and kvm_slot_page_track_add_page() have also to handle the two worlds. And I don't think there is any page-table in SMM-world, so kvm_slot_page_track_is_active() can just skip the SMM-world and check the non-SMM world only. Thanks Lai > > diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h > index eb186bc57f6a..fdd9de31e160 100644 > --- a/arch/x86/include/asm/kvm_page_track.h > +++ b/arch/x86/include/asm/kvm_page_track.h > @@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm, > void kvm_slot_page_track_remove_page(struct kvm *kvm, > struct kvm_memory_slot *slot, gfn_t gfn, > enum kvm_page_track_mode mode); > -bool kvm_slot_page_track_is_active(struct kvm *kvm, > +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu, > const struct kvm_memory_slot *slot, > gfn_t gfn, enum kvm_page_track_mode mode); > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 254bc46234e0..1c2200042133 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) > * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must > * be write-protected. > */ > -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, > +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, > gfn_t gfn, bool can_unsync, bool prefetch) > { > + struct kvm *kvm = vcpu->kvm; > struct kvm_mmu_page *sp; > bool locked = false; > > @@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, > * track machinery is used to write-protect upper-level shadow pages, > * i.e. this guards the role.level == 4K assertion below! > */ > - if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE)) > + if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE)) > return -EPERM; > > /* > @@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, > * guest is writing the page which is write tracked which can > * not be fixed by page fault handler. > */ > - if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE)) > + if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE)) > return true; > > return false; > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index ac00bfbf32f6..38040ab27986 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) > return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode; > } > > -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, > +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, > + const struct kvm_memory_slot *slot, > gfn_t gfn, bool can_unsync, bool prefetch); > > void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn); > diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c > index 2e09d1b6249f..0e9bc837257e 100644 > --- a/arch/x86/kvm/mmu/page_track.c > +++ b/arch/x86/kvm/mmu/page_track.c > @@ -18,6 +18,7 @@ > > #include "mmu.h" > #include "mmu_internal.h" > +#include "smm.h" > > bool kvm_page_track_write_tracking_enabled(struct kvm *kvm) > { > @@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm, > } > EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page); > > +static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot, > + gfn_t gfn, enum kvm_page_track_mode mode) > +{ > + int index; > + > + if (!slot) > + return false; > + > + index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K); > + return !!READ_ONCE(slot->arch.gfn_track[mode][index]); > +} > + > /* > * check if the corresponding access on the specified guest page is tracked. > */ > -bool kvm_slot_page_track_is_active(struct kvm *kvm, > +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu, > const struct kvm_memory_slot *slot, > gfn_t gfn, enum kvm_page_track_mode mode) > { > - int index; > - > if (WARN_ON(!page_track_mode_is_valid(mode))) > return false; > > - if (!slot) > - return false; > - > if (mode == KVM_PAGE_TRACK_WRITE && > - !kvm_page_track_write_tracking_enabled(kvm)) > + !kvm_page_track_write_tracking_enabled(vcpu->kvm)) > return false; > > - index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K); > - return !!READ_ONCE(slot->arch.gfn_track[mode][index]); > + if (__kvm_slot_page_track_is_active(slot, gfn, mode)) > + return true; > + > + if (!is_smm(vcpu)) > + return false; > + > + return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn), > + gfn, mode); > } > > void kvm_page_track_cleanup(struct kvm *kvm) > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c > index c0fd7e049b4e..89ddd113c1b9 100644 > --- a/arch/x86/kvm/mmu/spte.c > +++ b/arch/x86/kvm/mmu/spte.c > @@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > * e.g. it's write-tracked (upper-level SPs) or has one or more > * shadow pages and unsync'ing pages is not allowed. > */ > - if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) { > + if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) { > pgprintk("%s: found shadow page for %llx, marking ro\n", > __func__, gfn); > wrprot = true; > > base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c > -- >
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index aa4eb8cfcd7e..0a819d40131a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -348,7 +348,7 @@ union kvm_mmu_page_role { * simple shift. While there is room, give it a whole * byte so it is also faster to load it from memory. */ - unsigned smm:8; + unsigned as_id:8; }; }; @@ -2056,7 +2056,7 @@ enum { # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE # define KVM_ADDRESS_SPACE_NUM 2 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm) +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id) #else # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0) #endif diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4d188f056933..f375b719f565 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs) union kvm_cpu_role role = {0}; role.base.access = ACC_ALL; - role.base.smm = is_smm(vcpu); + role.base.as_id = is_smm(vcpu); role.base.guest_mode = is_guest_mode(vcpu); role.ext.valid = 1; @@ -5112,7 +5112,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, role.access = ACC_ALL; role.cr0_wp = true; role.efer_nx = true; - role.smm = cpu_role.base.smm; + role.as_id = cpu_role.base.as_id; role.guest_mode = cpu_role.base.guest_mode; role.ad_disabled = !kvm_ad_enabled(); role.level = kvm_mmu_get_tdp_level(vcpu); @@ -5233,7 +5233,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, /* * KVM does not support SMM transfer monitors, and consequently does not - * support the "entry to SMM" control either. role.base.smm is always 0. + * support the "entry to SMM" control either. role.base.as_id is always 0. */ WARN_ON_ONCE(is_smm(vcpu)); role.base.level = level; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index ac00bfbf32f6..5427f65117b4 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -133,16 +133,6 @@ struct kvm_mmu_page { extern struct kmem_cache *mmu_page_header_cache; -static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role) -{ - return role.smm ? 1 : 0; -} - -static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) -{ - return kvm_mmu_role_as_id(sp->role); -} - static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) { /* diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index 39b48e7d7d1a..4a7d58bf81c4 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -52,7 +52,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root, iter->root_level = root_level; iter->min_level = min_level; iter->pt_path[iter->root_level - 1] = (tdp_ptep_t)root->spt; - iter->as_id = kvm_mmu_page_as_id(root); + iter->as_id = root->role.as_id; tdp_iter_restart(iter); } diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 764f7c87286f..7ccac1aa8df6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -237,7 +237,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, _root; \ _root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid)) \ if (kvm_lockdep_assert_mmu_lock_held(_kvm, _shared) && \ - kvm_mmu_page_as_id(_root) != _as_id) { \ + _root->role.as_id != _as_id) { \ } else #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared) \ @@ -256,7 +256,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ - kvm_mmu_page_as_id(_root) != _as_id) { \ + _root->role.as_id != _as_id) { \ } else static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) @@ -310,7 +310,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) * Check for an existing root before allocating a new one. Note, the * role check prevents consuming an invalid root. */ - for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { + for_each_tdp_mmu_root(kvm, root, role.as_id) { if (root->role.word == role.word && kvm_tdp_mmu_get_root(root)) goto out; @@ -496,8 +496,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, REMOVED_SPTE, level); } - handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, - old_spte, REMOVED_SPTE, level, shared); + handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte, + REMOVED_SPTE, level, shared); } call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); @@ -923,7 +923,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte))) return false; - __tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0, + __tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0, sp->gfn, sp->role.level + 1, true, true); return true;
Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it directly as the address space ID throughout the KVM MMU code. This eliminates a needless level of indirection, kvm_mmu_role_as_id(), and prepares for making kvm_mmu_page_role architecture-neutral. Signed-off-by: David Matlack <dmatlack@google.com> --- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu/mmu.c | 6 +++--- arch/x86/kvm/mmu/mmu_internal.h | 10 ---------- arch/x86/kvm/mmu/tdp_iter.c | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++------ 5 files changed, 12 insertions(+), 22 deletions(-)