Message ID | 20220630135747.26983-1-will@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: Introduce pKVM shadow state at EL2 | expand |
On Thu, Jun 30, 2022, Will Deacon wrote: > Hi everyone, > > This series has been extracted from the pKVM base support series (aka > "pKVM mega-patch") previously posted here: > > https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/ > > Unlike that more comprehensive series, this one is fairly fundamental > and does not introduce any new ABI commitments, leaving questions > involving the management of guest private memory and the creation of > protected VMs for future work. Instead, this series extends the pKVM EL2 > code so that it can dynamically instantiate and manage VM shadow > structures without the host being able to access them directly. These > shadow structures consist of a shadow VM, a set of shadow vCPUs and the > stage-2 page-table and the pages used to hold them are returned to the > host when the VM is destroyed. > > The last patch is marked as RFC because, although it plumbs in the > shadow state, it is woefully inefficient and copies to/from the host > state on every vCPU run. Without the last patch, the new structures are > unused but we move considerably closer to isolating guests from the > host. ... > arch/arm64/include/asm/kvm_asm.h | 6 +- > arch/arm64/include/asm/kvm_host.h | 65 +++ > arch/arm64/include/asm/kvm_hyp.h | 3 + > arch/arm64/include/asm/kvm_pgtable.h | 8 + > arch/arm64/include/asm/kvm_pkvm.h | 38 ++ > arch/arm64/kernel/image-vars.h | 15 - > arch/arm64/kvm/arm.c | 40 +- > arch/arm64/kvm/hyp/hyp-constants.c | 3 + > arch/arm64/kvm/hyp/include/nvhe/gfp.h | 6 +- > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 19 +- > arch/arm64/kvm/hyp/include/nvhe/memory.h | 26 +- > arch/arm64/kvm/hyp/include/nvhe/mm.h | 18 +- > arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 70 +++ > arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +- > arch/arm64/kvm/hyp/nvhe/cache.S | 11 + > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 105 +++- > arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 + > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 456 +++++++++++++++++- > arch/arm64/kvm/hyp/nvhe/mm.c | 136 +++++- > arch/arm64/kvm/hyp/nvhe/page_alloc.c | 42 +- > arch/arm64/kvm/hyp/nvhe/pkvm.c | 438 +++++++++++++++++ > arch/arm64/kvm/hyp/nvhe/setup.c | 96 ++-- > arch/arm64/kvm/hyp/pgtable.c | 9 + > arch/arm64/kvm/mmu.c | 26 + > arch/arm64/kvm/pkvm.c | 121 ++++- > 25 files changed, 1625 insertions(+), 144 deletions(-) > create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h The lack of documentation and the rather terse changelogs make this really hard to review for folks that aren't intimately familiar with pKVM. I have a decent idea of the end goal of "shadowing", but that's mostly because of my involvement in similar x86 projects. Nothing in the changelogs ever explains _why_ pKVM uses shadows. I put "shadowing" in quotes because if the unstrusted host is aware that the VM and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would argue that it's not truly shadowing, especially if pKVM uses data/values verbatim and only verifies correctness/safety. It's definitely a nit, but for future readers I think overloading "shadowing" could be confusing. And beyond the basics, IMO pKVM needs a more formal definition of exactly what guest state is protected/hidden from the untrusted host. Peeking at the mega series, there are a huge pile of patches that result in "gradual reduction of EL2 trust in host data", but I couldn't any documentation that defines what that end result is.
Hi Sean, Thanks for having a look. On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote: > On Thu, Jun 30, 2022, Will Deacon wrote: > > This series has been extracted from the pKVM base support series (aka > > "pKVM mega-patch") previously posted here: > > > > https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/ > > > > Unlike that more comprehensive series, this one is fairly fundamental > > and does not introduce any new ABI commitments, leaving questions > > involving the management of guest private memory and the creation of > > protected VMs for future work. Instead, this series extends the pKVM EL2 > > code so that it can dynamically instantiate and manage VM shadow > > structures without the host being able to access them directly. These > > shadow structures consist of a shadow VM, a set of shadow vCPUs and the > > stage-2 page-table and the pages used to hold them are returned to the > > host when the VM is destroyed. > > > > The last patch is marked as RFC because, although it plumbs in the > > shadow state, it is woefully inefficient and copies to/from the host > > state on every vCPU run. Without the last patch, the new structures are > > unused but we move considerably closer to isolating guests from the > > host. > > ... > > > arch/arm64/include/asm/kvm_asm.h | 6 +- > > arch/arm64/include/asm/kvm_host.h | 65 +++ > > arch/arm64/include/asm/kvm_hyp.h | 3 + > > arch/arm64/include/asm/kvm_pgtable.h | 8 + > > arch/arm64/include/asm/kvm_pkvm.h | 38 ++ > > arch/arm64/kernel/image-vars.h | 15 - > > arch/arm64/kvm/arm.c | 40 +- > > arch/arm64/kvm/hyp/hyp-constants.c | 3 + > > arch/arm64/kvm/hyp/include/nvhe/gfp.h | 6 +- > > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 19 +- > > arch/arm64/kvm/hyp/include/nvhe/memory.h | 26 +- > > arch/arm64/kvm/hyp/include/nvhe/mm.h | 18 +- > > arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 70 +++ > > arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +- > > arch/arm64/kvm/hyp/nvhe/cache.S | 11 + > > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 105 +++- > > arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 + > > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 456 +++++++++++++++++- > > arch/arm64/kvm/hyp/nvhe/mm.c | 136 +++++- > > arch/arm64/kvm/hyp/nvhe/page_alloc.c | 42 +- > > arch/arm64/kvm/hyp/nvhe/pkvm.c | 438 +++++++++++++++++ > > arch/arm64/kvm/hyp/nvhe/setup.c | 96 ++-- > > arch/arm64/kvm/hyp/pgtable.c | 9 + > > arch/arm64/kvm/mmu.c | 26 + > > arch/arm64/kvm/pkvm.c | 121 ++++- > > 25 files changed, 1625 insertions(+), 144 deletions(-) > > create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h > > The lack of documentation and the rather terse changelogs make this really hard > to review for folks that aren't intimately familiar with pKVM. I have a decent > idea of the end goal of "shadowing", but that's mostly because of my involvement in > similar x86 projects. Nothing in the changelogs ever explains _why_ pKVM uses > shadows. That's understandable, but thanks for persevering; this series is pretty down in the murky depths of the arm64 architecture and EL2 code so it doesn't really map to the KVM code most folks are familiar with. It's fair to say we're assuming a lot of niche prior knowledge (which is quite common for arch code in my experience), but I wanted to inherit the broader cc list so you were aware of this break-away series. Sadly, I don't think beefing up the commit messages would get us to a point where somebody unfamiliar with the EL2 code already could give a constructive review, but we can try to expand them a bit if you genuinely think it would help. On the more positive side, we'll be speaking at KVM forum about what we've done here, so that will be a great place to discuss it further and then we can also link back to the recordings in later postings of the mega-series. > I put "shadowing" in quotes because if the unstrusted host is aware that the VM > and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API > between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would > argue that it's not truly shadowing, especially if pKVM uses data/values verbatim > and only verifies correctness/safety. It's definitely a nit, but for future readers > I think overloading "shadowing" could be confusing. Ah, this is really interesting and nicely puts the ball back in my court as I'm not well versed with x86's use of "shadowing". We should probably use another name (ideas?), but our "shadow" is very much explicit -- rather than the host using its 'struct kvm's and 'struct kvm_vcpu's directly, it instead passes those data structures to the hypervisor which allocates its own structures (in some cases reusing the host definitions directly, e.g. 'struct kvm_vcpu') and returning a handle to the host as a reference. The host can then issue hypercalls with this handle to load/put vCPUs of that VM, run them once they are loaded and synchronise aspects of the vCPU state between the host and the hypervisor copies for things like emulation traps or interrupt injection. The main thing is that the pages containing the hypervisor structures are not accessible by the host until the corresponding VM is destroyed. The advantage of doing it this way is that we don't need to change very much of the host KVM code at all, and we can even reuse some of it directly in the hypervisor (e.g. inline functions and macros). Perhaps we should s/shadow/hyp/ to make this a little clearer? > And beyond the basics, IMO pKVM needs a more formal definition of exactly what > guest state is protected/hidden from the untrusted host. Peeking at the mega series, > there are a huge pile of patches that result in "gradual reduction of EL2 trust in > host data", but I couldn't any documentation that defines what that end result is. That's fair; I'll work to extend the documentation in the next iteration of the mega series to cover this in more detail. Roughly speaking, the end result is that the vCPU register and memory state is inaccessible to the host except in cases where the guest has done something to expose it such as MMIO or a memory sharing hypercall. Given the complexity of the register state (GPRs, floating point, SIMD, debug, etc) the mega-series elavates portions of the state from the host to the hypervisor as separate patches to structure things a bit better (that's where the gradual reduction comes in). Does that help at all? Cheers, Will
On Thu, Jun 30, 2022 at 02:57:23PM +0100, Will Deacon wrote: > Hi everyone, > > This series has been extracted from the pKVM base support series (aka > "pKVM mega-patch") previously posted here: > > https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/ > > Unlike that more comprehensive series, this one is fairly fundamental > and does not introduce any new ABI commitments, leaving questions > involving the management of guest private memory and the creation of > protected VMs for future work. Instead, this series extends the pKVM EL2 > code so that it can dynamically instantiate and manage VM shadow > structures without the host being able to access them directly. These > shadow structures consist of a shadow VM, a set of shadow vCPUs and the > stage-2 page-table and the pages used to hold them are returned to the > host when the VM is destroyed. > > The last patch is marked as RFC because, although it plumbs in the > shadow state, it is woefully inefficient and copies to/from the host > state on every vCPU run. Without the last patch, the new structures are > unused but we move considerably closer to isolating guests from the > host. > > The series is based on Marc's rework of the flags > (kvm-arm64/burn-the-flags). > > Feedback welcome. > > Cheers, Only had few nitpicks Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Also, I've been using this patchset for quite a while now. Tested-by: Vincent Donnefort <vdonnefort@google.com> [...]
Apologies for the slow reply. On Fri, Jul 08, 2022, Will Deacon wrote: > On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote: > > On Thu, Jun 30, 2022, Will Deacon wrote: > > The lack of documentation and the rather terse changelogs make this really hard > > to review for folks that aren't intimately familiar with pKVM. I have a decent > > idea of the end goal of "shadowing", but that's mostly because of my involvement in > > similar x86 projects. Nothing in the changelogs ever explains _why_ pKVM uses > > shadows. > > That's understandable, but thanks for persevering; this series is pretty > down in the murky depths of the arm64 architecture and EL2 code so it > doesn't really map to the KVM code most folks are familiar with. It's fair > to say we're assuming a lot of niche prior knowledge (which is quite common > for arch code in my experience), Assuming prior knowledge is fine so long as that prior knowledge is something that can be gained by future readers through public documentation. E.g. arch code is always littered with acronyms, but someone can almost always decipher the acronyms by reading specs or just searching the interwebs. My objection to the changelogs is that they talk about "shadow" VMs, vCPUs, state, structures, etc., without ever explaining what a shadow is, how it will be used, or what its end purpose is. A big part of the problem is that "shadow" is not unique terminology _and_ isn't inherently tied to "protected kvm", e.g. a reader can't intuit that a "shadow vm" is the the trusted hypervisors instance of the VM. And I can search for pKVM and get relevant, helpful search results. But if I search for "shadow vm", I get unrelated results, and "pkvm shadow vm" just leads me back to this series. > but I wanted to inherit the broader cc list so you were aware of this > break-away series. Sadly, I don't think beefing up the commit messages would > get us to a point where somebody unfamiliar with the EL2 code already could > give a constructive review, but we can try to expand them a bit if you > genuinely think it would help. I'm not looking at it just from a review point, but also from a future readers perspective. E.g. someone that looks at this changelog in isolation is going to have no idea what a "shadow VM" is: KVM: arm64: Introduce pKVM shadow VM state at EL2 Introduce a table of shadow VM structures at EL2 and provide hypercalls to the host for creating and destroying shadow VMs. Obviously there will be some context available in surrounding patches, but if you avoid the "shadow" terminology and provide a bit more context, then it yields something like: KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Introduce a global table (and lock) to track pKVM instances at EL2, and provide hypercalls that can be used by the untrusted host to create and destroy pKVM VMs. pKVM VM/vCPU state is directly accessible only by the trusted hypervisor (EL2). Each pKVM VM is directly associated with an untrusted host KVM instance, and is referenced by the host using an opaque handle. Future patches will provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU state using the opaque handle. > On the more positive side, we'll be speaking at KVM forum about what we've > done here, so that will be a great place to discuss it further and then we > can also link back to the recordings in later postings of the mega-series. > > > I put "shadowing" in quotes because if the unstrusted host is aware that the VM > > and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API > > between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would > > argue that it's not truly shadowing, especially if pKVM uses data/values verbatim > > and only verifies correctness/safety. It's definitely a nit, but for future readers > > I think overloading "shadowing" could be confusing. > > Ah, this is really interesting and nicely puts the ball back in my court as > I'm not well versed with x86's use of "shadowing". It's not just an x86, e.g. see https://en.wikipedia.org/wiki/Shadow_table. The use in pKVM is _really_ close in that what pKVM calls the shadow is the "real" data that's used, but pKVM inverts the typical virtualization usage, which is why I find it confusing. I.e. instead of shadowing state being written by the guest, pKVM is "shadowing" state written by the host. If there ever comes a need to actually shadow guest state, e.g. for nested virtualization, then using shadow to refer to the protected state is going to create a conundrum. Honestly, I think pKVM is simply being too cute in picking names. And not just for "shadow", e.g. IMO the flush/sync terminology in patch 24 is also unnecessarily cute. Instead of coming up with clever names, just be explicit in what the code is doing. E.g. something like: flush_shadow_state() => sync_host_to_pkvm_vcpu() sync_shadow_state() => sync_pkvm_to_host_vcpu() Then readers know the two functions are pairs, and will have a decent idea of what the functions do even if they don't fully understand pKVM vs. host. "shadow_area_size" is another case where it's unnecessarily cryptic, e.g. just call it "donated_memory_size". > Perhaps we should s/shadow/hyp/ to make this a little clearer? Or maybe just "pkvm"? I think that's especially viable if you do away with kvm_shadow_vcpu_state. As of this series at least, kvm_shadow_vcpu_state is completely unnecessary. kvm_vcpu.kvm can be used to get at the VM, and thus pKVM state via container_of(). Then the host_vcpu can be retrieved by using the vcpu_idx, e.g. struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm); struct kvm_vcpu *host_vcpu; host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx); Even better is to not have to do that in the first place. AFAICT, there's no need to do such a lookup in handle___kvm_vcpu_run() since pKVM already has pointers to both the host vCPU and the pKVM vCPU. E.g. I believe you can make the code look like this: struct kvm_arch { ... /* * For an unstructed host VM, pkvm_handle is used to lookup the * associated pKVM instance. */ pvk_handle_t pkvm_handle; }; struct pkvm_vm { struct kvm kvm; /* Backpointer to the host's (untrusted) KVM instance. */ struct kvm *host_kvm; size_t donated_memory_size; struct kvm_pgtable pgt; }; static struct kvm *pkvm_get_vm(pkvm_handle_t handle) { unsigned int idx = pkvm_handle_to_idx(handle); if (unlikely(idx >= KVM_MAX_PVMS)) return NULL; return pkvm_vm_table[idx]; } struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx) { struct kvm_vpcu *pkvm_vcpu = NULL; struct kvm *vm; hyp_spin_lock(&pkvm_global_lock); vm = pkvm_get_vm(handle); if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx) goto unlock; pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx); hyp_page_ref_inc(hyp_virt_to_page(vm)); unlock: hyp_spin_unlock(&pkvm_global_lock); return pkvm_vcpu; } struct kvm_vcpu *pkvm_vcpu_put(struct kvm_vcpu *pkvm_vcpu) { hyp_spin_lock(&pkvm_global_lock); hyp_page_ref_dec(hyp_virt_to_page(pkvm_vcpu->kvm)); hyp_spin_unlock(&pkvm_global_lock); } static void sync_host_to_pkvm_vcpu(struct kvm_vcpu *pkvm_vcpu, struct kvm_vcpu *host_vcpu) { pkvm_vcpu->arch.ctxt = host_vcpu->arch.ctxt; pkvm_vcpu->arch.sve_state = kern_hyp_va(host_vcpu->arch.sve_state); pkvm_vcpu->arch.sve_max_vl = host_vcpu->arch.sve_max_vl; pkvm_vcpu->arch.hw_mmu = host_vcpu->arch.hw_mmu; pkvm_vcpu->arch.hcr_el2 = host_vcpu->arch.hcr_el2; pkvm_vcpu->arch.mdcr_el2 = host_vcpu->arch.mdcr_el2; pkvm_vcpu->arch.cptr_el2 = host_vcpu->arch.cptr_el2; pkvm_vcpu->arch.iflags = host_vcpu->arch.iflags; pkvm_vcpu->arch.fp_state = host_vcpu->arch.fp_state; pkvm_vcpu->arch.debug_ptr = kern_hyp_va(host_vcpu->arch.debug_ptr); pkvm_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state; pkvm_vcpu->arch.vsesr_el2 = host_vcpu->arch.vsesr_el2; pkvm_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3; } static void sync_pkvm_to_host_vcpu(struct kvm_vcpu *host_vcpu, struct kvm_vcpu *pkvm_vcpu) { struct vgic_v3_cpu_if *pkvm_cpu_if = &pkvm_vcpu->arch.vgic_cpu.vgic_v3; struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3; unsigned int i; host_vcpu->arch.ctxt = pkvm_vcpu->arch.ctxt; host_vcpu->arch.hcr_el2 = pkvm_vcpu->arch.hcr_el2; host_vcpu->arch.cptr_el2 = pkvm_vcpu->arch.cptr_el2; host_vcpu->arch.fault = pkvm_vcpu->arch.fault; host_vcpu->arch.iflags = pkvm_vcpu->arch.iflags; host_vcpu->arch.fp_state = pkvm_vcpu->arch.fp_state; host_cpu_if->vgic_hcr = pkvm_cpu_if->vgic_hcr; for (i = 0; i < pkvm_cpu_if->used_lrs; ++i) host_cpu_if->vgic_lr[i] = pkvm_cpu_if->vgic_lr[i]; } static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt) { DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1); int ret; host_vcpu = kern_hyp_va(host_vcpu); if (unlikely(is_protected_kvm_enabled())) { struct kvm *host_kvm = kern_hyp_va(host_vcpu->kvm); struct kvm_vcpu *pkvm_vcpu; pkvm_vcpu = pkvm_vcpu_load(host_kvm, host_vcpu); if (!pkvm_vcpu) { ret = -EINVAL; goto out; } sync_host_to_pkvm_vcpu(pkvm_vcpu, host_vcpu); ret = __kvm_vcpu_run(pkvm_vcpu); sync_pkvm_to_host_vcpu(host_vcpu, pkvm_vcpu); pkvm_vcpu_put(pkvm_vcpu); } else { ret = __kvm_vcpu_run(host_vcpu); } out: cpu_reg(host_ctxt, 1) = ret; }
On Tue, 19 Jul 2022 17:11:32 +0100, Sean Christopherson <seanjc@google.com> wrote: > Honestly, I think pKVM is simply being too cute in picking names. I don't know what you mean by "cute" here, but I assume this is not exactly a flattering qualifier. > And not just for "shadow", e.g. IMO the flush/sync terminology in > patch 24 is also unnecessarily cute. Instead of coming up with > clever names, just be explicit in what the code is doing. > E.g. something like: > > flush_shadow_state() => sync_host_to_pkvm_vcpu() > sync_shadow_state() => sync_pkvm_to_host_vcpu() As much as I like bikesheding, this isn't going to happen. We have had the sync/flush duality since day one, we have a lot of code based around this naming, and departing from it seems counter productive. M.
Hi Sean, On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote: > Apologies for the slow reply. No problem; you've provided a tonne of insightful feedback here, so it was worth the wait. Thanks! > On Fri, Jul 08, 2022, Will Deacon wrote: > > but I wanted to inherit the broader cc list so you were aware of this > > break-away series. Sadly, I don't think beefing up the commit messages would > > get us to a point where somebody unfamiliar with the EL2 code already could > > give a constructive review, but we can try to expand them a bit if you > > genuinely think it would help. > > I'm not looking at it just from a review point, but also from a future readers > perspective. E.g. someone that looks at this changelog in isolation is going to > have no idea what a "shadow VM" is: > > KVM: arm64: Introduce pKVM shadow VM state at EL2 > > Introduce a table of shadow VM structures at EL2 and provide hypercalls > to the host for creating and destroying shadow VMs. > > Obviously there will be some context available in surrounding patches, but if you > avoid the "shadow" terminology and provide a bit more context, then it yields > something like: > > KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 > > Introduce a global table (and lock) to track pKVM instances at EL2, and > provide hypercalls that can be used by the untrusted host to create and > destroy pKVM VMs. pKVM VM/vCPU state is directly accessible only by the > trusted hypervisor (EL2). > > Each pKVM VM is directly associated with an untrusted host KVM instance, > and is referenced by the host using an opaque handle. Future patches will > provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU > state using the opaque handle. Thanks, that's much better. I'll have to summon up the energy to go through the others as well... > > Perhaps we should s/shadow/hyp/ to make this a little clearer? > > Or maybe just "pkvm"? I think the "hyp" part is useful to distinguish the pkvm code running at EL2 from the pkvm code running at EL1. For example, we have a 'pkvm' member in 'struct kvm_arch' which is used by the _host_ at EL1. So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is nice and short... > I think that's especially viable if you do away with > kvm_shadow_vcpu_state. As of this series at least, kvm_shadow_vcpu_state is > completely unnecessary. kvm_vcpu.kvm can be used to get at the VM, and thus pKVM > state via container_of(). Then the host_vcpu can be retrieved by using the > vcpu_idx, e.g. > > struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm); > struct kvm_vcpu *host_vcpu; > > host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx); Using container_of() here is neat; we can definitely go ahead with that change. However, looking at this in more detail with Fuad, removing 'struct kvm_shadow_vcpu_state' entirely isn't going to work: > E.g. I believe you can make the code look like this: > > struct kvm_arch { > ... > > /* > * For an unstructed host VM, pkvm_handle is used to lookup the > * associated pKVM instance. > */ > pvk_handle_t pkvm_handle; > }; > > struct pkvm_vm { > struct kvm kvm; > > /* Backpointer to the host's (untrusted) KVM instance. */ > struct kvm *host_kvm; > > size_t donated_memory_size; > > struct kvm_pgtable pgt; > }; > > static struct kvm *pkvm_get_vm(pkvm_handle_t handle) > { > unsigned int idx = pkvm_handle_to_idx(handle); > > if (unlikely(idx >= KVM_MAX_PVMS)) > return NULL; > > return pkvm_vm_table[idx]; > } > > struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx) > { > struct kvm_vpcu *pkvm_vcpu = NULL; > struct kvm *vm; > > hyp_spin_lock(&pkvm_global_lock); > vm = pkvm_get_vm(handle); > if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx) > goto unlock; > > pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx); kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is really something which we cannot support at EL2 where, amongst other things, we do not have support for RCU. Consequently, we do need to keep our own mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU. We also end up expanding the 'struct kvm_shadow_vcpu_state' structure later to track additional vCPU state in the hypervisor, for example in the mega-series: https://lore.kernel.org/kvmarm/20220519134204.5379-78-will@kernel.org/#Z31arch:arm64:kvm:hyp:include:nvhe:pkvm.h Cheers, Will
On Wed, Jul 20, 2022, Will Deacon wrote: > Hi Sean, > > On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote: > > Or maybe just "pkvm"? > > I think the "hyp" part is useful to distinguish the pkvm code running at EL2 > from the pkvm code running at EL1. For example, we have a 'pkvm' member in > 'struct kvm_arch' which is used by the _host_ at EL1. Right, my suggestion was to rename that to pkvm_handle to avoid a direct conflict, and then that naturally yields the "pkvm_handle => pkvm_vm" association. Or are you expecting to shove more stuff into the that "pkvm" struct? > So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is > nice and short... I 100% agree that differentating between EL1 and EL2 is important for functions, structs and global variables, but I would argue it's not so important for fields and local variables where the "owning" struct/function provides that context. But that's actually a partial argument for just using "hyp". My concern with just using e.g. "kvm_hyp" is that, because non-pKVM nVHE also has the host vs. hyp split, it could lead people to believe that "kvm_hyp" is also used for the non-pKVM case. So, what about a blend? E.g. "struct pkvm_hyp_vcpu *hyp_vcpu". That provides the context that the struct is specific to the EL2 side of pKVM, most usage is nice and short, and the "hyp" prefix avoids the ambiguity that a bare "pkvm" would suffer for EL1 vs. EL2. Doesn't look awful? static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt) { DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1); int ret; host_vcpu = kern_hyp_va(host_vcpu); if (unlikely(is_protected_kvm_enabled())) { struct pkvm_hyp_vcpu *hyp_vcpu; struct kvm *host_kvm; host_kvm = kern_hyp_va(host_vcpu->kvm); hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle, host_vcpu->vcpu_idx); if (!hyp_vcpu) { ret = -EINVAL; goto out; } flush_pkvm_guest_state(hyp_vcpu); ret = __kvm_vcpu_run(shadow_vcpu); sync_pkvm_guest_state(hyp_vcpu); pkvm_put_hyp_vcpu(shadow_state); } else { /* The host is fully trusted, run its vCPU directly. */ ret = __kvm_vcpu_run(host_vcpu); } out: cpu_reg(host_ctxt, 1) = ret; } > > I think that's especially viable if you do away with > > kvm_shadow_vcpu_state. As of this series at least, kvm_shadow_vcpu_state is > > completely unnecessary. kvm_vcpu.kvm can be used to get at the VM, and thus pKVM > > state via container_of(). Then the host_vcpu can be retrieved by using the > > vcpu_idx, e.g. > > > > struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm); > > struct kvm_vcpu *host_vcpu; > > > > host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx); > > Using container_of() here is neat; we can definitely go ahead with that > change. However, looking at this in more detail with Fuad, removing > 'struct kvm_shadow_vcpu_state' entirely isn't going to work: > > struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx) > > { > > struct kvm_vpcu *pkvm_vcpu = NULL; > > struct kvm *vm; > > > > hyp_spin_lock(&pkvm_global_lock); > > vm = pkvm_get_vm(handle); > > if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx) > > goto unlock; > > > > pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx); > > kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is > really something which we cannot support at EL2 where, amongst other things, > we do not have support for RCU. Consequently, we do need to keep our own > mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU. Hmm, are there guardrails in place to prevent using "unsafe" fields from "struct kvm" and "struct kvm_vcpu" at EL2? If not, it seems like embedding the common structs in the hyp/pkvm-specific structs is going bite us in the rear at some point. Mostly out of curiosity, I assume the EL2 restriction only applies to nVHE mode? And waaaay off topic, has anyone explored adding macro magic to generate wrappers to (un)marshall registers to parameters/returns for the hyp functions? E.g. it'd be neat if you could make the code look like this without having to add a wrapper for every function: static int handle___kvm_vcpu_run(unsigned long __host_vcpu) { struct kvm_vcpu *host_vcpu = kern_hyp_va(__host_vcpu); int ret; if (unlikely(is_protected_kvm_enabled())) { struct pkvm_hyp_vcpu *hyp_vcpu; struct kvm *host_kvm; host_kvm = kern_hyp_va(host_vcpu->kvm); hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle, host_vcpu->vcpu_idx); if (!hyp_vcpu) return -EINVAL; flush_hypervisor_state(hyp_vcpu); ret = __kvm_vcpu_run(shadow_vcpu); sync_hypervisor_state(hyp_vcpu); pkvm_put_hyp_vcpu(shadow_state); } else { /* The host is fully trusted, run its vCPU directly. */ ret = __kvm_vcpu_run(host_vcpu); } return ret; }