Message ID | 20211013155831.943476-17-qperret@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Implement unshare hypercall for pkvm | expand |
On Wed, 13 Oct 2021 16:58:31 +0100, Quentin Perret <qperret@google.com> wrote: > > Make use of the newly introduced unshare hypercall during guest teardown > to unmap guest-related data structures from the hyp stage-1. > > Signed-off-by: Quentin Perret <qperret@google.com> > --- > arch/arm64/include/asm/kvm_host.h | 2 ++ > arch/arm64/include/asm/kvm_mmu.h | 1 + > arch/arm64/kvm/arm.c | 2 ++ > arch/arm64/kvm/fpsimd.c | 10 ++++++++-- > arch/arm64/kvm/mmu.c | 16 ++++++++++++++++ > arch/arm64/kvm/reset.c | 13 ++++++++++++- > 6 files changed, 41 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index f8be56d5342b..8b61cdcd1b29 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -322,6 +322,8 @@ struct kvm_vcpu_arch { > > struct thread_info *host_thread_info; /* hyp VA */ > struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */ > + struct thread_info *kern_thread_info; > + struct user_fpsimd_state *kern_fpsimd_state; > > struct { > /* {Break,watch}point registers */ > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 185d0f62b724..81839e9a8a24 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -151,6 +151,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v) > #include <asm/stage2_pgtable.h> > > int kvm_share_hyp(void *from, void *to); > +void kvm_unshare_hyp(void *from, void *to); > int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot); > int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, > void __iomem **kaddr, > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index f2e74635332b..f11c51db6fe6 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -188,6 +188,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm) > } > } > atomic_set(&kvm->online_vcpus, 0); > + > + kvm_unshare_hyp(kvm, kvm + 1); > } > > int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c > index 2fe1128d9f3d..67059daf4d26 100644 > --- a/arch/arm64/kvm/fpsimd.c > +++ b/arch/arm64/kvm/fpsimd.c > @@ -28,23 +28,29 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu) > { > int ret; > > - struct thread_info *ti = ¤t->thread_info; > - struct user_fpsimd_state *fpsimd = ¤t->thread.uw.fpsimd_state; > + struct thread_info *ti = vcpu->arch.kern_thread_info; > + struct user_fpsimd_state *fpsimd = vcpu->arch.kern_fpsimd_state; > > /* > * Make sure the host task thread flags and fpsimd state are > * visible to hyp: > */ > + kvm_unshare_hyp(ti, ti + 1); At this stage, the old thread may have been destroyed and the memory recycled. What happens if, in the interval, that memory gets shared again in another context? My guts feeling is that either the sharing fails, or the unshare above breaks something else (the refcounting doesn't save us if the sharing is not with HYP). In any case, I wonder whether we need a notification from the core code that a thread for which we have a HYP mapping is gone so that we can mop up the mapping at that point. That's similar to what we have for MMU notifiers and S2 PTs. This is doable by hooking into fpsimd_release_task() and extending thread_struct to track the sharing with HYP. Thanks, M.
On Saturday 16 Oct 2021 at 13:25:45 (+0100), Marc Zyngier wrote: > At this stage, the old thread may have been destroyed and the memory > recycled. What happens if, in the interval, that memory gets shared > again in another context? My guts feeling is that either the sharing > fails, or the unshare above breaks something else (the refcounting > doesn't save us if the sharing is not with HYP). Aha, yes, that's a good point. The problematic scenario would be: a vcpu runs in the context of task A, then blocks. Then task A is destroyed, but the vcpu isn't (possibly because e.g. userspace intends to run it in the context of another task or something along those lines). But the thread_info and fpsimd_state of task A remain shared with the hypervisor until the next vcpu run, even though the memory has been freed by the host, and is possibly reallocated to another guest in the meantime. So yes, at this point sharing/donating this memory range with a new guest will fail, and confuse the host massively :/ > In any case, I wonder whether we need a notification from the core > code that a thread for which we have a HYP mapping is gone so that we > can mop up the mapping at that point. That's similar to what we have > for MMU notifiers and S2 PTs. > > This is doable by hooking into fpsimd_release_task() and extending > thread_struct to track the sharing with HYP. I've been looking into this, but struggled to find a good way to avoid all the races. Specifically, handling the case where a vcpu and the task which last ran it get destroyed at the same time isn't that easy to handle: you end up having to maintain pointers from the task to the vcpu and vice versa, but I have no obvious idea to update them both in a non-racy way (other than having a one big lock to serialize all those changes, but that would mean serializing all task destructions so that really doesn't scale). Another option is to take a refcount on 'current' from kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with the hyp and release the refcount of the previous task after unsharing. But that means we'll have to also drop the refcount when the vcpu gets destroyed, as well as explicitly unshare at that point. Shouldn't be too bad I think. Thoughts? Thanks, Quentin
On Monday 18 Oct 2021 at 11:32:13 (+0100), Quentin Perret wrote: > Another option is to take a refcount on 'current' from > kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with > the hyp and release the refcount of the previous task after unsharing. > But that means we'll have to also drop the refcount when the vcpu > gets destroyed, as well as explicitly unshare at that point. Shouldn't > be too bad I think. Thoughts? Something like the below seems to work OK on my setup, including SIGKILL'ing the guest and such. How much do you hate it? diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f8be56d5342b..50598d704c71 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -322,6 +322,7 @@ struct kvm_vcpu_arch { struct thread_info *host_thread_info; /* hyp VA */ struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */ + struct task_struct *parent_task; struct { /* {Break,watch}point registers */ @@ -738,6 +739,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu); static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) { diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 2fe1128d9f3d..27afeebbe1cb 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -15,6 +15,22 @@ #include <asm/kvm_mmu.h> #include <asm/sysreg.h> +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu) +{ + struct task_struct *p = vcpu->arch.parent_task; + struct user_fpsimd_state *fpsimd; + struct thread_info *ti; + + if (!static_branch_likely(&kvm_protected_mode_initialized) || !p) + return; + + ti = &p->thread_info; + kvm_unshare_hyp(ti, ti + 1); + fpsimd = &p->thread.uw.fpsimd_state; + kvm_unshare_hyp(fpsimd, fpsimd + 1); + put_task_struct(p); +} + /* * Called on entry to KVM_RUN unless this vcpu previously ran at least * once and the most recent prior KVM_RUN for this vcpu was called from @@ -31,6 +47,8 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu) struct thread_info *ti = ¤t->thread_info; struct user_fpsimd_state *fpsimd = ¤t->thread.uw.fpsimd_state; + kvm_vcpu_unshare_task_fp(vcpu); + /* * Make sure the host task thread flags and fpsimd state are * visible to hyp: @@ -45,6 +63,17 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu) vcpu->arch.host_thread_info = kern_hyp_va(ti); vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd); + + /* + * We need to keep current's task_struct pinned until its data has been + * unshared with the hypervisor to make sure it is not re-used by the + * kernel and donated to someone else while already shared -- see + * kvm_vcpu_unshare_task_fp() for the matching put_task_struct(). + */ + if (static_branch_likely(&kvm_protected_mode_initialized)) { + get_task_struct(current); + vcpu->arch.parent_task = current; + } error: return ret; } diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 5ce36b0a3343..c2a2cd7d5748 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -141,7 +141,13 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu) void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu) { - kfree(vcpu->arch.sve_state); + void *sve_state = vcpu->arch.sve_state; + + kvm_vcpu_unshare_task_fp(vcpu); + kvm_unshare_hyp(vcpu, vcpu + 1); + if (sve_state && vcpu->arch.has_run_once) + kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu)); + kfree(sve_state); } static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
On 2021-10-18 15:03, Quentin Perret wrote: > On Monday 18 Oct 2021 at 11:32:13 (+0100), Quentin Perret wrote: >> Another option is to take a refcount on 'current' from >> kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with >> the hyp and release the refcount of the previous task after unsharing. >> But that means we'll have to also drop the refcount when the vcpu >> gets destroyed, as well as explicitly unshare at that point. Shouldn't >> be too bad I think. Thoughts? > > Something like the below seems to work OK on my setup, including > SIGKILL'ing the guest and such. How much do you hate it? It is annoyingly elegant! Small nitpick below. > > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index f8be56d5342b..50598d704c71 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -322,6 +322,7 @@ struct kvm_vcpu_arch { > > struct thread_info *host_thread_info; /* hyp VA */ > struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */ > + struct task_struct *parent_task; > > struct { > /* {Break,watch}point registers */ > @@ -738,6 +739,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu > *vcpu); > void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu); > void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); > void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); > +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu); > > static inline bool kvm_pmu_counter_deferred(struct perf_event_attr > *attr) > { > diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c > index 2fe1128d9f3d..27afeebbe1cb 100644 > --- a/arch/arm64/kvm/fpsimd.c > +++ b/arch/arm64/kvm/fpsimd.c > @@ -15,6 +15,22 @@ > #include <asm/kvm_mmu.h> > #include <asm/sysreg.h> > > +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu) > +{ > + struct task_struct *p = vcpu->arch.parent_task; > + struct user_fpsimd_state *fpsimd; > + struct thread_info *ti; > + > + if (!static_branch_likely(&kvm_protected_mode_initialized) || !p) Shouldn't this be a check on is_protected_kvm_enabled() instead? The two should be equivalent outside of the initialisation code... Otherwise, ship it. M.
On Monday 18 Oct 2021 at 18:12:22 (+0100), Marc Zyngier wrote: > On 2021-10-18 15:03, Quentin Perret wrote: > > On Monday 18 Oct 2021 at 11:32:13 (+0100), Quentin Perret wrote: > > > Another option is to take a refcount on 'current' from > > > kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with > > > the hyp and release the refcount of the previous task after unsharing. > > > But that means we'll have to also drop the refcount when the vcpu > > > gets destroyed, as well as explicitly unshare at that point. Shouldn't > > > be too bad I think. Thoughts? > > > > Something like the below seems to work OK on my setup, including > > SIGKILL'ing the guest and such. How much do you hate it? > > It is annoyingly elegant! Small nitpick below. > > > > > diff --git a/arch/arm64/include/asm/kvm_host.h > > b/arch/arm64/include/asm/kvm_host.h > > index f8be56d5342b..50598d704c71 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -322,6 +322,7 @@ struct kvm_vcpu_arch { > > > > struct thread_info *host_thread_info; /* hyp VA */ > > struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */ > > + struct task_struct *parent_task; > > > > struct { > > /* {Break,watch}point registers */ > > @@ -738,6 +739,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu); > > void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu); > > void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); > > void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); > > +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu); > > > > static inline bool kvm_pmu_counter_deferred(struct perf_event_attr > > *attr) > > { > > diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c > > index 2fe1128d9f3d..27afeebbe1cb 100644 > > --- a/arch/arm64/kvm/fpsimd.c > > +++ b/arch/arm64/kvm/fpsimd.c > > @@ -15,6 +15,22 @@ > > #include <asm/kvm_mmu.h> > > #include <asm/sysreg.h> > > > > +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu) > > +{ > > + struct task_struct *p = vcpu->arch.parent_task; > > + struct user_fpsimd_state *fpsimd; > > + struct thread_info *ti; > > + > > + if (!static_branch_likely(&kvm_protected_mode_initialized) || !p) > > Shouldn't this be a check on is_protected_kvm_enabled() instead? > The two should be equivalent outside of the initialisation code... Yup, it'd be nice to do checks on kvm_protected_mode_initialized only when they're strictly necessary, and that's not the case here. I'll fold that change in v2. Cheers Quentin
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f8be56d5342b..8b61cdcd1b29 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -322,6 +322,8 @@ struct kvm_vcpu_arch { struct thread_info *host_thread_info; /* hyp VA */ struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */ + struct thread_info *kern_thread_info; + struct user_fpsimd_state *kern_fpsimd_state; struct { /* {Break,watch}point registers */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 185d0f62b724..81839e9a8a24 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -151,6 +151,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v) #include <asm/stage2_pgtable.h> int kvm_share_hyp(void *from, void *to); +void kvm_unshare_hyp(void *from, void *to); int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot); int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, void __iomem **kaddr, diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index f2e74635332b..f11c51db6fe6 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -188,6 +188,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm) } } atomic_set(&kvm->online_vcpus, 0); + + kvm_unshare_hyp(kvm, kvm + 1); } int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 2fe1128d9f3d..67059daf4d26 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -28,23 +28,29 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu) { int ret; - struct thread_info *ti = ¤t->thread_info; - struct user_fpsimd_state *fpsimd = ¤t->thread.uw.fpsimd_state; + struct thread_info *ti = vcpu->arch.kern_thread_info; + struct user_fpsimd_state *fpsimd = vcpu->arch.kern_fpsimd_state; /* * Make sure the host task thread flags and fpsimd state are * visible to hyp: */ + kvm_unshare_hyp(ti, ti + 1); + ti = ¤t->thread_info; ret = kvm_share_hyp(ti, ti + 1); if (ret) goto error; + kvm_unshare_hyp(fpsimd, fpsimd + 1); + fpsimd = ¤t->thread.uw.fpsimd_state; ret = kvm_share_hyp(fpsimd, fpsimd + 1); if (ret) goto error; vcpu->arch.host_thread_info = kern_hyp_va(ti); vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd); + vcpu->arch.kern_thread_info = ti; + vcpu->arch.kern_fpsimd_state = fpsimd; error: return ret; } diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index bc9865a8c988..f01b0e49e262 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -300,6 +300,22 @@ int kvm_share_hyp(void *from, void *to) nr_pages); } +void kvm_unshare_hyp(void *from, void *to) +{ + phys_addr_t start, end; + u64 nr_pages; + + if (is_kernel_in_hyp_mode() || kvm_host_owns_hyp_mappings() || !from) + return; + + start = ALIGN_DOWN(kvm_kaddr_to_phys(from), PAGE_SIZE); + end = PAGE_ALIGN(kvm_kaddr_to_phys(to)); + nr_pages = (end - start) >> PAGE_SHIFT; + + WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, __phys_to_pfn(start), + nr_pages)); +} + /** * create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode * @from: The virtual kernel start address of the range diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 5ce36b0a3343..e3e9c9e1f1c8 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -141,7 +141,18 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu) void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu) { - kfree(vcpu->arch.sve_state); + struct user_fpsimd_state *fpsimd = vcpu->arch.kern_fpsimd_state; + struct thread_info *ti = vcpu->arch.kern_thread_info; + void *sve_state = vcpu->arch.sve_state; + + kvm_unshare_hyp(vcpu, vcpu + 1); + if (ti) + kvm_unshare_hyp(ti, ti + 1); + if (fpsimd) + kvm_unshare_hyp(fpsimd, fpsimd + 1); + if (sve_state && vcpu->arch.has_run_once) + kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu)); + kfree(sve_state); } static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
Make use of the newly introduced unshare hypercall during guest teardown to unmap guest-related data structures from the hyp stage-1. Signed-off-by: Quentin Perret <qperret@google.com> --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/include/asm/kvm_mmu.h | 1 + arch/arm64/kvm/arm.c | 2 ++ arch/arm64/kvm/fpsimd.c | 10 ++++++++-- arch/arm64/kvm/mmu.c | 16 ++++++++++++++++ arch/arm64/kvm/reset.c | 13 ++++++++++++- 6 files changed, 41 insertions(+), 3 deletions(-)