diff mbox

[v3,09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put

Message ID 20180112120747.27999-10-christoffer.dall@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Christoffer Dall Jan. 12, 2018, 12:07 p.m. UTC
Avoid saving the guest VFP registers and restoring the host VFP
registers on every exit from the VM.  Only when we're about to run
userspace or other threads in the kernel do we really have to switch the
state back to the host state.

We still initially configure the VFP registers to trap when entering the
VM, but the difference is that we now leave the guest state in the
hardware registers as long as we're running this VCPU, even if we
occasionally trap to the host, and we only restore the host state when
we return to user space or when scheduling another thread.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp/entry.S        |  3 +++
 arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
 arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
 5 files changed, 40 insertions(+), 36 deletions(-)

Comments

Dave Martin Jan. 22, 2018, 5:33 p.m. UTC | #1
On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the
> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers as long as we're running this VCPU, even if we
> occasionally trap to the host, and we only restore the host state when
> we return to user space or when scheduling another thread.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

[...]

> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 883a6383cd36..848a46eb33bf 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c

[...]

> @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   */
>  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> +
> +	/* Restore host FP/SIMD state */
> +	if (vcpu->arch.guest_vfp_loaded) {
> +		if (vcpu_el1_is_32bit(vcpu)) {
> +			kvm_call_hyp(__fpsimd32_save_state,
> +				     kern_hyp_va(guest_ctxt));
> +		}
> +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> +		vcpu->arch.guest_vfp_loaded = 0;

Provided we've already marked the host FPSIMD state as dirty on the way
in, we probably don't need to restore it here.

In v4.15, the kvm_fpsimd_flush_cpu_state() call in
kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
it's only done for SVE, since KVM was previously restoring the host
FPSIMD subset of the state anyway, but it could be made unconditional.

For a returning run ioctl, this would have the effect of deferring the
host FPSIMD reload until we return to userspace, which is probably
no more costly since the kernel must check whether to do this in
ret_to_user anyway; OTOH if the vcpu thread was preempted by some
other thread we save the cost of restoring the host state entirely here
... I think.

Ultimately I'd like to go one better and actually treat a vcpu as a
first-class fpsimd context, so that taking an interrupt to the host
and then reentering the guest doesn't cause any reload at all.  But
that feels like too big a step for this series, and there are likely
side-issues I've not thought about yet.

Cheers
---Dave
Christoffer Dall Jan. 25, 2018, 7:46 p.m. UTC | #2
On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > Avoid saving the guest VFP registers and restoring the host VFP
> > registers on every exit from the VM.  Only when we're about to run
> > userspace or other threads in the kernel do we really have to switch the
> > state back to the host state.
> > 
> > We still initially configure the VFP registers to trap when entering the
> > VM, but the difference is that we now leave the guest state in the
> > hardware registers as long as we're running this VCPU, even if we
> > occasionally trap to the host, and we only restore the host state when
> > we return to user space or when scheduling another thread.
> > 
> > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> [...]
> 
> > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > index 883a6383cd36..848a46eb33bf 100644
> > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> 
> [...]
> 
> > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> >   */
> >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> >  {
> > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	/* Restore host FP/SIMD state */
> > +	if (vcpu->arch.guest_vfp_loaded) {
> > +		if (vcpu_el1_is_32bit(vcpu)) {
> > +			kvm_call_hyp(__fpsimd32_save_state,
> > +				     kern_hyp_va(guest_ctxt));
> > +		}
> > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > +		vcpu->arch.guest_vfp_loaded = 0;
> 
> Provided we've already marked the host FPSIMD state as dirty on the way
> in, we probably don't need to restore it here.
> 
> In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> it's only done for SVE, since KVM was previously restoring the host
> FPSIMD subset of the state anyway, but it could be made unconditional.
> 
> For a returning run ioctl, this would have the effect of deferring the
> host FPSIMD reload until we return to userspace, which is probably
> no more costly since the kernel must check whether to do this in
> ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> other thread we save the cost of restoring the host state entirely here
> ... I think.

Yes, I agree.  However, currently the low-level logic in
arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
host_cpu_context is a KVM-specific per-cpu variable).  I think means
that simply marking the state as invalid would cause the kernel to
restore some potentially stale values when returning to userspace.  Am I
missing something?

It might very well be possible to change the logic so that we store the
host logic the same place where task_fpsimd_save() would have, and I
think that would make what you suggest possible.

I'd like to make that a separate change from this patch though, as we're
already changing quite a bit with this series, so I'm trying to make any
logical change as contained per patch as possible, so that problems can
be spotted by bisecting.

> 
> Ultimately I'd like to go one better and actually treat a vcpu as a
> first-class fpsimd context, so that taking an interrupt to the host
> and then reentering the guest doesn't cause any reload at all.  

That should be the case already; kvm_vcpu_put_sysregs() is only called
when you run another thread (preemptively or voluntarily), or when you
return to user space, but making the vcpu fpsimd context a first-class
citizen fpsimd context would mean that you can run another thread (and
maybe run userspace if it doesn't use fpsimd?) without having to
save/restore anything.  Am I getting this right?

> But
> that feels like too big a step for this series, and there are likely
> side-issues I've not thought about yet.
> 

It should definitely be in separate patches, but I would be optn to
tagging something on to the end of this series if we can stabilize this
series early after -rc1 is out.

Thanks,
-Christoffer
Dave Martin Feb. 7, 2018, 4:49 p.m. UTC | #3
On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > Avoid saving the guest VFP registers and restoring the host VFP
> > > registers on every exit from the VM.  Only when we're about to run
> > > userspace or other threads in the kernel do we really have to switch the
> > > state back to the host state.
> > > 
> > > We still initially configure the VFP registers to trap when entering the
> > > VM, but the difference is that we now leave the guest state in the
> > > hardware registers as long as we're running this VCPU, even if we
> > > occasionally trap to the host, and we only restore the host state when
> > > we return to user space or when scheduling another thread.
> > > 
> > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > 
> > [...]
> > 
> > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > index 883a6383cd36..848a46eb33bf 100644
> > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > 
> > [...]
> > 
> > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > >   */
> > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > >  {
> > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > +
> > > +	/* Restore host FP/SIMD state */
> > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > +				     kern_hyp_va(guest_ctxt));
> > > +		}
> > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > +		vcpu->arch.guest_vfp_loaded = 0;
> > 
> > Provided we've already marked the host FPSIMD state as dirty on the way
> > in, we probably don't need to restore it here.
> > 
> > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > it's only done for SVE, since KVM was previously restoring the host
> > FPSIMD subset of the state anyway, but it could be made unconditional.
> > 
> > For a returning run ioctl, this would have the effect of deferring the
> > host FPSIMD reload until we return to userspace, which is probably
> > no more costly since the kernel must check whether to do this in
> > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > other thread we save the cost of restoring the host state entirely here
> > ... I think.
> 
> Yes, I agree.  However, currently the low-level logic in
> arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> host_cpu_context is a KVM-specific per-cpu variable).  I think means
> that simply marking the state as invalid would cause the kernel to
> restore some potentially stale values when returning to userspace.  Am I
> missing something?

I think my point was that there would be no need for the low-level
save of the host fpsimd state currently done by hyp.  At all.  The
state would already have been saved off to thread_struct before
entering the guest.

This would result in a redundant save, but only when the host fpsimd
state is dirty and the guest vcpu doesn't touch fpsimd before trapping
back to the host.

For the host, the fpsimd state is only dirty after entering the kernel
from userspace (or after certain other things like sigreturn or ptrace).
So this approach would still avoid repeated save/restore when cycling
between the guest and the kvm code in the host.

> It might very well be possible to change the logic so that we store the
> host logic the same place where task_fpsimd_save() would have, and I
> think that would make what you suggest possible.

That's certainly possible, but I viewed that as harder.  It would be
necessary to map the host thread_struct into hyp etc. etc.

> I'd like to make that a separate change from this patch though, as we're
> already changing quite a bit with this series, so I'm trying to make any
> logical change as contained per patch as possible, so that problems can
> be spotted by bisecting.

Yes, I think that's wise.

> > Ultimately I'd like to go one better and actually treat a vcpu as a
> > first-class fpsimd context, so that taking an interrupt to the host
> > and then reentering the guest doesn't cause any reload at all.  
> 
> That should be the case already; kvm_vcpu_put_sysregs() is only called
> when you run another thread (preemptively or voluntarily), or when you
> return to user space, but making the vcpu fpsimd context a first-class
> citizen fpsimd context would mean that you can run another thread (and
> maybe run userspace if it doesn't use fpsimd?) without having to
> save/restore anything.  Am I getting this right?

Yes (except that if a return to userspace happens then FPSIMD will be
restored at that point: there is no laziness there -- it _could_
be lazy, but it's deemed unlikely to be a performance win due to the
fact that the compiler can and does generate FPSIMD code quite
liberally by default).

For the case of being preempted within the kernel with no ret_to_user,
you are correct.

> 
> > But
> > that feels like too big a step for this series, and there are likely
> > side-issues I've not thought about yet.
> > 
> 
> It should definitely be in separate patches, but I would be optn to
> tagging something on to the end of this series if we can stabilize this
> series early after -rc1 is out.

I haven't fully got my head around it, but we can see where we get to.
Best not to rush into it if there's any doubt...

Cheers
---Dave
Christoffer Dall Feb. 7, 2018, 5:56 p.m. UTC | #4
On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > registers on every exit from the VM.  Only when we're about to run
> > > > userspace or other threads in the kernel do we really have to switch the
> > > > state back to the host state.
> > > > 
> > > > We still initially configure the VFP registers to trap when entering the
> > > > VM, but the difference is that we now leave the guest state in the
> > > > hardware registers as long as we're running this VCPU, even if we
> > > > occasionally trap to the host, and we only restore the host state when
> > > > we return to user space or when scheduling another thread.
> > > > 
> > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > 
> > > [...]
> > > 
> > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > index 883a6383cd36..848a46eb33bf 100644
> > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > 
> > > [...]
> > > 
> > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > >   */
> > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > >  {
> > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > +
> > > > +	/* Restore host FP/SIMD state */
> > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > +				     kern_hyp_va(guest_ctxt));
> > > > +		}
> > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > 
> > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > in, we probably don't need to restore it here.
> > > 
> > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > it's only done for SVE, since KVM was previously restoring the host
> > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > 
> > > For a returning run ioctl, this would have the effect of deferring the
> > > host FPSIMD reload until we return to userspace, which is probably
> > > no more costly since the kernel must check whether to do this in
> > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > other thread we save the cost of restoring the host state entirely here
> > > ... I think.
> > 
> > Yes, I agree.  However, currently the low-level logic in
> > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > that simply marking the state as invalid would cause the kernel to
> > restore some potentially stale values when returning to userspace.  Am I
> > missing something?
> 
> I think my point was that there would be no need for the low-level
> save of the host fpsimd state currently done by hyp.  At all.  The
> state would already have been saved off to thread_struct before
> entering the guest.

Ah, so if userspace touched any FPSIMD state, then we always save that
state when entering the kernel, even if we're just going to return to
the same userspace process anyway?  (For any system call etc.?)

> 
> This would result in a redundant save, but only when the host fpsimd
> state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> back to the host.
> 
> For the host, the fpsimd state is only dirty after entering the kernel
> from userspace (or after certain other things like sigreturn or ptrace).
> So this approach would still avoid repeated save/restore when cycling
> between the guest and the kvm code in the host.
> 

I see.

> > It might very well be possible to change the logic so that we store the
> > host logic the same place where task_fpsimd_save() would have, and I
> > think that would make what you suggest possible.
> 
> That's certainly possible, but I viewed that as harder.  It would be
> necessary to map the host thread_struct into hyp etc. etc.
> 

And even then, unnecessary because it would duplicate the existing state
save, IIUC above.

> > I'd like to make that a separate change from this patch though, as we're
> > already changing quite a bit with this series, so I'm trying to make any
> > logical change as contained per patch as possible, so that problems can
> > be spotted by bisecting.
> 
> Yes, I think that's wise.
> 

ok, I'll try to incorporate this as a separate patch for the next
revision.

> > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > first-class fpsimd context, so that taking an interrupt to the host
> > > and then reentering the guest doesn't cause any reload at all.  
> > 
> > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > when you run another thread (preemptively or voluntarily), or when you
> > return to user space, but making the vcpu fpsimd context a first-class
> > citizen fpsimd context would mean that you can run another thread (and
> > maybe run userspace if it doesn't use fpsimd?) without having to
> > save/restore anything.  Am I getting this right?
> 
> Yes (except that if a return to userspace happens then FPSIMD will be
> restored at that point: there is no laziness there -- it _could_
> be lazy, but it's deemed unlikely to be a performance win due to the
> fact that the compiler can and does generate FPSIMD code quite
> liberally by default).
> 
> For the case of being preempted within the kernel with no ret_to_user,
> you are correct.
> 

ok, that would indeed also be useful for things like switching to a
vhost thread and returning to the vcpu thread.

> > 
> > > But
> > > that feels like too big a step for this series, and there are likely
> > > side-issues I've not thought about yet.
> > > 
> > 
> > It should definitely be in separate patches, but I would be optn to
> > tagging something on to the end of this series if we can stabilize this
> > series early after -rc1 is out.
> 
> I haven't fully got my head around it, but we can see where we get to.
> Best not to rush into it if there's any doubt...
> 
Agreed, we can always add things later.

Thanks,
-Christoffer
Julien Grall Feb. 9, 2018, 3:26 p.m. UTC | #5
Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the

s/do// ?

> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers as long as we're running this VCPU, even if we
> occasionally trap to the host, and we only restore the host state when
> we return to user space or when scheduling another thread.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_host.h |  3 +++
>   arch/arm64/kernel/asm-offsets.c   |  1 +
>   arch/arm64/kvm/hyp/entry.S        |  3 +++
>   arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
>   arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
>   5 files changed, 40 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0e9e7291a7e6..9e23bc968668 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
>   	/* Guest debug state */
>   	u64 debug_flags;
>   
> +	/* 1 if the guest VFP state is loaded into the hardware */
> +	u8 guest_vfp_loaded;
> +
>   	/*
>   	 * We maintain more than a single set of debug registers to support
>   	 * debugging the guest from the host and to maintain separate host and
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 612021dce84f..99467327c043 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -133,6 +133,7 @@ int main(void)
>     DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
>     DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
>     DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
> +  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
>     DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
>     DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>     DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index a360ac6e89e9..53652287a236 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -184,6 +184,9 @@ alternative_endif
>   	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>   	bl	__fpsimd_restore_state
>   
> +	mov	x0, #1
> +	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
> +
>   	// Skip restoring fpexc32 for AArch64 guests
>   	mrs	x1, hcr_el2
>   	tbnz	x1, #HCR_RW_SHIFT, 1f
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 12dc647a6e5f..29e44a20f5e3 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -24,43 +24,32 @@
>   #include <asm/fpsimd.h>
>   #include <asm/debug-monitors.h>
>   
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> -{
> -	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> -
> -static bool __hyp_text __fpsimd_enabled_vhe(void)
> -{
> -	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> -}
> -
> -static hyp_alternate_select(__fpsimd_is_enabled,
> -			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> -			    ARM64_HAS_VIRT_HOST_EXTN);
> -
> -bool __hyp_text __fpsimd_enabled(void)

Now that __fpsimd_enabled is removed, I think you need to remove the 
prototype in arch/arm64/include/kvm_hyp.h too.

> -{
> -	return __fpsimd_is_enabled()();
> -}

Cheers,
Dave Martin Feb. 9, 2018, 3:59 p.m. UTC | #6
On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> > On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > > registers on every exit from the VM.  Only when we're about to run
> > > > > userspace or other threads in the kernel do we really have to switch the
> > > > > state back to the host state.
> > > > > 
> > > > > We still initially configure the VFP registers to trap when entering the
> > > > > VM, but the difference is that we now leave the guest state in the
> > > > > hardware registers as long as we're running this VCPU, even if we
> > > > > occasionally trap to the host, and we only restore the host state when
> > > > > we return to user space or when scheduling another thread.
> > > > > 
> > > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > index 883a6383cd36..848a46eb33bf 100644
> > > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > > >   */
> > > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > > >  {
> > > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > > +
> > > > > +	/* Restore host FP/SIMD state */
> > > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > > +				     kern_hyp_va(guest_ctxt));
> > > > > +		}
> > > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > > 
> > > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > > in, we probably don't need to restore it here.
> > > > 
> > > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > > it's only done for SVE, since KVM was previously restoring the host
> > > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > > 
> > > > For a returning run ioctl, this would have the effect of deferring the
> > > > host FPSIMD reload until we return to userspace, which is probably
> > > > no more costly since the kernel must check whether to do this in
> > > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > > other thread we save the cost of restoring the host state entirely here
> > > > ... I think.
> > > 
> > > Yes, I agree.  However, currently the low-level logic in
> > > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > > that simply marking the state as invalid would cause the kernel to
> > > restore some potentially stale values when returning to userspace.  Am I
> > > missing something?
> > 
> > I think my point was that there would be no need for the low-level
> > save of the host fpsimd state currently done by hyp.  At all.  The
> > state would already have been saved off to thread_struct before
> > entering the guest.
> 
> Ah, so if userspace touched any FPSIMD state, then we always save that
> state when entering the kernel, even if we're just going to return to
> the same userspace process anyway?  (For any system call etc.?)

Not exactly.  The state is saved when the corresponding user task is
scheduled out, or when it would become stale because we're about to
modify the task_struct view of the state (sigreturn/PTRACE_SETREGST).
The state is also saved if necessary by kernel_neon_begin().

Simply entering the kernel and returning to userspace doesn't have
this effect by itself.


Prior to the SVE patches, KVM makes itself orthogonal to the host
context switch machinery by ensuring that whatever the host had
in the FPSIMD regs at guest entry is restored before returning to
the host. (IIUC)  This means that redundant save/restore work is
done by KVM, but does have the advantage of simplicity.

This breaks for SVE though: the high bits of the Z-registers will be
zeroed as a side effect of the FPSIMD save/restore done by KVM.
This means that if the host has state in those bits then it must
be saved before entring the guest: that's what the new
kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
The alternative would have been for KVM to save/restore the host SVE
state directly, but this seemed premature and invasive in the absence
of full KVM SVE support.

This means that KVM's own save/restore of the host's FPSIMD state
becomes redundant in this case, but since there is no SVE hardware
yet, I favoured correctness over optimal performance here.


My point here was that we could modify this hook to always save off the
host FPSIMD state unconditionally before entering the guts of KVM,
instead of only doing it when there is live SVE state.  The benefit of
this is that the host context switch machinery knows if the state has
already been saved and won't do it again.  Thus a kvm userspace -> vcpu
(-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
will only save the host FPSIMD (or SVE) state once, and won't restore
it at all (assuming no context switches).

Instead, the user thread's FPSIMD state is only reloaded on the final
return to userspace.

> > 
> > This would result in a redundant save, but only when the host fpsimd
> > state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> > back to the host.
> > 
> > For the host, the fpsimd state is only dirty after entering the kernel
> > from userspace (or after certain other things like sigreturn or ptrace).
> > So this approach would still avoid repeated save/restore when cycling
> > between the guest and the kvm code in the host.
> > 
> 
> I see.
> 
> > > It might very well be possible to change the logic so that we store the
> > > host logic the same place where task_fpsimd_save() would have, and I
> > > think that would make what you suggest possible.
> > 
> > That's certainly possible, but I viewed that as harder.  It would be
> > necessary to map the host thread_struct into hyp etc. etc.
> > 
> 
> And even then, unnecessary because it would duplicate the existing state
> save, IIUC above.

Agreed.  The main disadvantage of my approach is that the host state
cannot be saved lazily any more, but it least it is only saved once
for a given vcpu run loop.

> > > I'd like to make that a separate change from this patch though, as we're
> > > already changing quite a bit with this series, so I'm trying to make any
> > > logical change as contained per patch as possible, so that problems can
> > > be spotted by bisecting.
> > 
> > Yes, I think that's wise.
> > 
> 
> ok, I'll try to incorporate this as a separate patch for the next
> revision.
> 
> > > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > > first-class fpsimd context, so that taking an interrupt to the host
> > > > and then reentering the guest doesn't cause any reload at all.  
> > > 
> > > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > > when you run another thread (preemptively or voluntarily), or when you
> > > return to user space, but making the vcpu fpsimd context a first-class
> > > citizen fpsimd context would mean that you can run another thread (and
> > > maybe run userspace if it doesn't use fpsimd?) without having to
> > > save/restore anything.  Am I getting this right?
> > 
> > Yes (except that if a return to userspace happens then FPSIMD will be
> > restored at that point: there is no laziness there -- it _could_
> > be lazy, but it's deemed unlikely to be a performance win due to the
> > fact that the compiler can and does generate FPSIMD code quite
> > liberally by default).
> > 
> > For the case of being preempted within the kernel with no ret_to_user,
> > you are correct.
> > 
> 
> ok, that would indeed also be useful for things like switching to a
> vhost thread and returning to the vcpu thread.

What's a vhost thread?

> > > 
> > > > But
> > > > that feels like too big a step for this series, and there are likely
> > > > side-issues I've not thought about yet.
> > > > 
> > > 
> > > It should definitely be in separate patches, but I would be optn to
> > > tagging something on to the end of this series if we can stabilize this
> > > series early after -rc1 is out.
> > 
> > I haven't fully got my head around it, but we can see where we get to.
> > Best not to rush into it if there's any doubt...
> > 
> Agreed, we can always add things later.

Cheers
---Dave
Christoffer Dall Feb. 13, 2018, 8:51 a.m. UTC | #7
On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> > > On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > > > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > > > registers on every exit from the VM.  Only when we're about to run
> > > > > > userspace or other threads in the kernel do we really have to switch the
> > > > > > state back to the host state.
> > > > > > 
> > > > > > We still initially configure the VFP registers to trap when entering the
> > > > > > VM, but the difference is that we now leave the guest state in the
> > > > > > hardware registers as long as we're running this VCPU, even if we
> > > > > > occasionally trap to the host, and we only restore the host state when
> > > > > > we return to user space or when scheduling another thread.
> > > > > > 
> > > > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > > index 883a6383cd36..848a46eb33bf 100644
> > > > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > > > >   */
> > > > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > > > >  {
> > > > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > > > +
> > > > > > +	/* Restore host FP/SIMD state */
> > > > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > > > +				     kern_hyp_va(guest_ctxt));
> > > > > > +		}
> > > > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > > > 
> > > > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > > > in, we probably don't need to restore it here.
> > > > > 
> > > > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > > > it's only done for SVE, since KVM was previously restoring the host
> > > > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > > > 
> > > > > For a returning run ioctl, this would have the effect of deferring the
> > > > > host FPSIMD reload until we return to userspace, which is probably
> > > > > no more costly since the kernel must check whether to do this in
> > > > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > > > other thread we save the cost of restoring the host state entirely here
> > > > > ... I think.
> > > > 
> > > > Yes, I agree.  However, currently the low-level logic in
> > > > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > > > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > > > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > > > that simply marking the state as invalid would cause the kernel to
> > > > restore some potentially stale values when returning to userspace.  Am I
> > > > missing something?
> > > 
> > > I think my point was that there would be no need for the low-level
> > > save of the host fpsimd state currently done by hyp.  At all.  The
> > > state would already have been saved off to thread_struct before
> > > entering the guest.
> > 
> > Ah, so if userspace touched any FPSIMD state, then we always save that
> > state when entering the kernel, even if we're just going to return to
> > the same userspace process anyway?  (For any system call etc.?)
> 
> Not exactly.  The state is saved when the corresponding user task is
> scheduled out, or when it would become stale because we're about to
> modify the task_struct view of the state (sigreturn/PTRACE_SETREGST).
> The state is also saved if necessary by kernel_neon_begin().
> 
> Simply entering the kernel and returning to userspace doesn't have
> this effect by itself.
> 
> 
> Prior to the SVE patches, KVM makes itself orthogonal to the host
> context switch machinery by ensuring that whatever the host had
> in the FPSIMD regs at guest entry is restored before returning to
> the host. (IIUC)  

Only if the guest actually touches FPSIMD state.  If the guest doesn't
touch FPSIMD (no trap to EL2), then we never touch the state at all.

> This means that redundant save/restore work is
> done by KVM, but does have the advantage of simplicity.

I don't understand what the redundant part here is?  Isn't it only
redundant in the case where the host (for some reason) has already saved
its FPSIMD state?  I assume that won't be the common case, since
"userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
explained above.

> 
> This breaks for SVE though: the high bits of the Z-registers will be
> zeroed as a side effect of the FPSIMD save/restore done by KVM.
> This means that if the host has state in those bits then it must
> be saved before entring the guest: that's what the new
> kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.

Again, I'm confused, because to me it looks like
kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
which just sets a pointer to NULL, but doesn't actually save the state.

So, when is the state in the hardware registers saved to memory?

> The alternative would have been for KVM to save/restore the host SVE
> state directly, but this seemed premature and invasive in the absence
> of full KVM SVE support.
> 
> This means that KVM's own save/restore of the host's FPSIMD state
> becomes redundant in this case, but since there is no SVE hardware
> yet, I favoured correctness over optimal performance here.
> 

I agree with the approach, I guess I just can't seem to follow the code
correctly...

> 
> My point here was that we could modify this hook to always save off the
> host FPSIMD state unconditionally before entering the guts of KVM,
> instead of only doing it when there is live SVE state.  The benefit of
> this is that the host context switch machinery knows if the state has
> already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> will only save the host FPSIMD (or SVE) state once, and won't restore
> it at all (assuming no context switches).
> 
> Instead, the user thread's FPSIMD state is only reloaded on the final
> return to userspace.
> 

I think that would invert the logic we have now, so instead of only
saving/restoring the FPSIMD state when the guest uses it (as we do now),
we would only save/restore the FPSIMD state when the host uses it,
regardless of what the guest does.

Ideally, we could have a combination of both, but it's unclear to me if
we have good indications that one case is more likely than the other.

My gut feeling though, is that the guest will be likely to often access
FPSIMD state for as long as we're in KVM_RUN, and that host userspace
also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
(kernel threads etc.) is unlikely to use FPSIMD for a system that is
primarily running VMs.

> > > 
> > > This would result in a redundant save, but only when the host fpsimd
> > > state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> > > back to the host.
> > > 
> > > For the host, the fpsimd state is only dirty after entering the kernel
> > > from userspace (or after certain other things like sigreturn or ptrace).
> > > So this approach would still avoid repeated save/restore when cycling
> > > between the guest and the kvm code in the host.
> > > 
> > 
> > I see.
> > 
> > > > It might very well be possible to change the logic so that we store the
> > > > host logic the same place where task_fpsimd_save() would have, and I
> > > > think that would make what you suggest possible.
> > > 
> > > That's certainly possible, but I viewed that as harder.  It would be
> > > necessary to map the host thread_struct into hyp etc. etc.
> > > 
> > 
> > And even then, unnecessary because it would duplicate the existing state
> > save, IIUC above.
> 
> Agreed.  The main disadvantage of my approach is that the host state
> cannot be saved lazily any more, but it least it is only saved once
> for a given vcpu run loop.
> 
> > > > I'd like to make that a separate change from this patch though, as we're
> > > > already changing quite a bit with this series, so I'm trying to make any
> > > > logical change as contained per patch as possible, so that problems can
> > > > be spotted by bisecting.
> > > 
> > > Yes, I think that's wise.
> > > 
> > 
> > ok, I'll try to incorporate this as a separate patch for the next
> > revision.
> > 
> > > > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > > > first-class fpsimd context, so that taking an interrupt to the host
> > > > > and then reentering the guest doesn't cause any reload at all.  
> > > > 
> > > > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > > > when you run another thread (preemptively or voluntarily), or when you
> > > > return to user space, but making the vcpu fpsimd context a first-class
> > > > citizen fpsimd context would mean that you can run another thread (and
> > > > maybe run userspace if it doesn't use fpsimd?) without having to
> > > > save/restore anything.  Am I getting this right?
> > > 
> > > Yes (except that if a return to userspace happens then FPSIMD will be
> > > restored at that point: there is no laziness there -- it _could_
> > > be lazy, but it's deemed unlikely to be a performance win due to the
> > > fact that the compiler can and does generate FPSIMD code quite
> > > liberally by default).
> > > 
> > > For the case of being preempted within the kernel with no ret_to_user,
> > > you are correct.
> > > 
> > 
> > ok, that would indeed also be useful for things like switching to a
> > vhost thread and returning to the vcpu thread.
> 
> What's a vhost thread?
> 

vhost is the offload mechanism for the data-path of virtio devices, and
for example if you have an application in your VM which is sending data,
the VM will typically fill some buffer and then cause a trap to the host
kernel by writing to an emulated PCI MMIO register, and that trap is
handled by KVM's IO bus infrastructure, which schedules a vhost kernel
thread which actually sends the data from the buffer shared between the
VM and the host kernel onto the physical network.

Thanks,
-Christoffer
Christoffer Dall Feb. 13, 2018, 8:52 a.m. UTC | #8
On Fri, Feb 09, 2018 at 03:26:59PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >Avoid saving the guest VFP registers and restoring the host VFP
> >registers on every exit from the VM.  Only when we're about to run
> >userspace or other threads in the kernel do we really have to switch the
> 
> s/do// ?
> 
> >state back to the host state.
> >
> >We still initially configure the VFP registers to trap when entering the
> >VM, but the difference is that we now leave the guest state in the
> >hardware registers as long as we're running this VCPU, even if we
> >occasionally trap to the host, and we only restore the host state when
> >we return to user space or when scheduling another thread.
> >
> >Reviewed-by: Andrew Jones <drjones@redhat.com>
> >Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kernel/asm-offsets.c   |  1 +
> >  arch/arm64/kvm/hyp/entry.S        |  3 +++
> >  arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
> >  arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
> >  5 files changed, 40 insertions(+), 36 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >index 0e9e7291a7e6..9e23bc968668 100644
> >--- a/arch/arm64/include/asm/kvm_host.h
> >+++ b/arch/arm64/include/asm/kvm_host.h
> >@@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
> >  	/* Guest debug state */
> >  	u64 debug_flags;
> >+	/* 1 if the guest VFP state is loaded into the hardware */
> >+	u8 guest_vfp_loaded;
> >+
> >  	/*
> >  	 * We maintain more than a single set of debug registers to support
> >  	 * debugging the guest from the host and to maintain separate host and
> >diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >index 612021dce84f..99467327c043 100644
> >--- a/arch/arm64/kernel/asm-offsets.c
> >+++ b/arch/arm64/kernel/asm-offsets.c
> >@@ -133,6 +133,7 @@ int main(void)
> >    DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
> >    DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
> >    DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
> >+  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
> >    DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
> >    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >    DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
> >diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >index a360ac6e89e9..53652287a236 100644
> >--- a/arch/arm64/kvm/hyp/entry.S
> >+++ b/arch/arm64/kvm/hyp/entry.S
> >@@ -184,6 +184,9 @@ alternative_endif
> >  	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >  	bl	__fpsimd_restore_state
> >+	mov	x0, #1
> >+	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
> >+
> >  	// Skip restoring fpexc32 for AArch64 guests
> >  	mrs	x1, hcr_el2
> >  	tbnz	x1, #HCR_RW_SHIFT, 1f
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 12dc647a6e5f..29e44a20f5e3 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -24,43 +24,32 @@
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> >-static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >-{
> >-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> >-}
> >-
> >-static bool __hyp_text __fpsimd_enabled_vhe(void)
> >-{
> >-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> >-}
> >-
> >-static hyp_alternate_select(__fpsimd_is_enabled,
> >-			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> >-			    ARM64_HAS_VIRT_HOST_EXTN);
> >-
> >-bool __hyp_text __fpsimd_enabled(void)
> 
> Now that __fpsimd_enabled is removed, I think you need to remove the
> prototype in arch/arm64/include/kvm_hyp.h too.
> 
Will do.

Thanks,
-Christoffer
Dave Martin Feb. 13, 2018, 2:08 p.m. UTC | #9
On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:

[...]

> > Simply entering the kernel and returning to userspace doesn't have
> > this effect by itself.
> > 
> > 
> > Prior to the SVE patches, KVM makes itself orthogonal to the host
> > context switch machinery by ensuring that whatever the host had
> > in the FPSIMD regs at guest entry is restored before returning to
> > the host. (IIUC)  
> 
> Only if the guest actually touches FPSIMD state.  If the guest doesn't
> touch FPSIMD (no trap to EL2), then we never touch the state at all.

I should have been clearer: KVM ensures that the state is _unchanged_
before returning to the host, but can elide the save/restore when the
guest doesn't touch the state...

> 
> > This means that redundant save/restore work is
> > done by KVM, but does have the advantage of simplicity.
> 
> I don't understand what the redundant part here is?  Isn't it only
> redundant in the case where the host (for some reason) has already saved
> its FPSIMD state?  I assume that won't be the common case, since
> "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
> explained above.

...however, when this elision does not occur, it may duplicate
save/restore done by the kernel, or it may save/restore worthless data
if the host's FPSIMD state is non-live at the time.

It's hard to gauge the impact of this: it seems unlikely to make a
massive difference, but will be highly workload-dependent.


The redundancy occurs because of the deferred restore of the FPSIMD
registers for host userspace: as a result, the host FPSIMD regs are
either discardable (i.e., already saved) or not live at all between
and context switch and the next ret_to_user.

This means that if the vcpu run loop is preempted, then when the host
switches back to the run loop it is pointless to save or restore the
host FPSIMD state.

A typical sequence of events exposing this redundancy would be as
follows.  I assume here that there are two cpu-bound tasks A and B
competing for a host CPU, where A is a vcpu thread:

 - vcpu A is in the guest running a compute-heavy task
 - FPSIMD typically traps to the host before context switch
 X kvm saves the host FPSIMD state
 - kvm loads the guest FPSIMD state
 - vcpu A reenters the guest
 - host context switch IRQ preempts A back to the run loop
 Y kvm loads the host FPSIMD state via vcpu_put

 - host context switch:
 - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
 - switch to B
 - B reaches ret_to_user
 Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
 - B enters userspace

 - host context switch:
 - B enters kernel
 X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
 - switch to A -> set TIF_FOREIGN_FPSTATE for A
 - back to the KVM run loop

 - vcpu A enters guest
 - redo from start

Here, the two saves marked X are redundant with respect to each other,
and the two restores marked Y are redundant with respect to each other.

> > This breaks for SVE though: the high bits of the Z-registers will be
> > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > This means that if the host has state in those bits then it must
> > be saved before entring the guest: that's what the new
> > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> 
> Again, I'm confused, because to me it looks like
> kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> which just sets a pointer to NULL, but doesn't actually save the state.
> 
> So, when is the state in the hardware registers saved to memory?

This _is_ quite confusing: in writing this answer I identified a bug
and then realised why there is no bug...

kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
actually saved today because we explicitly don't care about preserving
the SVE state, because the syscall ABI throws the SVE regs away as
a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
ensures that the non-SVE FPSIMD bits _are_ restored by itself.

I think my proposal is that this hook might take on the role of
actually saving the state too, if we move that out of the KVM host
context save/restore code.

Perhaps we could even replace

	preempt_disable();
	kvm_fpsimd_flush_cpu_state();
	/* ... */
	preempt_enable();

with

	kernel_neon_begin();
	/* ... */
	kernel_neon_end();

which does have the host user context saving built in -- though this
may have unwanted side effects, such as masking softirqs.  Possibly not
a big deal though if the region is kept small (?)


<aside>

Understanding precisely what kvm_fpsimd_flush_cpu_state() does is
not trivial...  the explanation goes something like this:

(*takes deep breath*)

A link is maintained between tasks and CPUs to indicate whether a
given CPU has the task's FPSIMD state in its regs.

For brevity, I'll describe this link as a relation loaded(task, cpu).

	loaded(current, smp_processor_id()) <->
		!test_thread_flag(TIF_FOREIGN_FPSTATE).

(In effect, TIF_FOREIGN_FPSTATE caches this relation for current.)

For non-current tasks, the relation is something like

	loaded(task, cpu) <->
		&task->thread.fpsimd_state ==
			per_cpu(fpsimd_last_state.st, cpu) &&
		task->thread.fpsimd_state.cpu == cpu.

There are subtleties about when these equivalences are meaningful
and how they can be checked safely that I'll gloss over here --
to get an idea, see cb968afc7898 ("arm64/sve: Avoid dereference of dead
task_struct in KVM guest entry").

 * loaded(task, cpu) is made false for all cpus and a given task
   by fpsimd_flush_task_state(task).

   This is how we invalidate a stale copy of some task's state when
   the kernel deliberately changes the state (e.g., exec, sigreturn,
   PTRACE_SETREGSET).

 * loaded(task, smp_processor_id()) is made false for all tasks
   by fpsimd_flush_cpu_state().

   This is how we avoid using the FPSIMD regs of some CPU that
   the kernel trashed (e.g., kernel_mode_neon, KVM) as a source
   of any task's FPSIMD state.

 * loaded(current, smp_processor_id()) is made true by
   fpsimd_bind_to_cpu().

   fpsimd_bind_to_cpu() also implies the effects of
   fpsimd_flush_task_state(current) and
   fpsimd_flush_cpu_state(smp_processor_id()) before the new relation is
   established.  This is not explicit in the code, but falls out from
   the way the relation is represented.


( There is a wrinkle here: fpsimd_flush_task_state(task) should always
be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
garbage left in the SVE bits by KVM's save/restore may spuriously
appear in the vcpu thread's user regs.  But since that data will be (a)
zeros or (b) the task's own data; and because TIF_SVE is cleared in
entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
think this matters in practice.

If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
case too then this becomes significant and we _would_ need to clear
TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the
vcpu user thread. )

</aside>

> > The alternative would have been for KVM to save/restore the host SVE
> > state directly, but this seemed premature and invasive in the absence
> > of full KVM SVE support.
> > 
> > This means that KVM's own save/restore of the host's FPSIMD state
> > becomes redundant in this case, but since there is no SVE hardware
> > yet, I favoured correctness over optimal performance here.
> > 
> 
> I agree with the approach, I guess I just can't seem to follow the code
> correctly...

Understandable... even trying to remember how it works is giving me a
headache #P

> 
> > 
> > My point here was that we could modify this hook to always save off the
> > host FPSIMD state unconditionally before entering the guts of KVM,
> > instead of only doing it when there is live SVE state.  The benefit of
> > this is that the host context switch machinery knows if the state has
> > already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> > (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> > will only save the host FPSIMD (or SVE) state once, and won't restore
> > it at all (assuming no context switches).
> > 
> > Instead, the user thread's FPSIMD state is only reloaded on the final
> > return to userspace.
> > 
> 
> I think that would invert the logic we have now, so instead of only
> saving/restoring the FPSIMD state when the guest uses it (as we do now),
> we would only save/restore the FPSIMD state when the host uses it,
> regardless of what the guest does.

I'm not sure that's a complete characterisation of what's going on,
but I'm struggling to describe my view in simple terms.

> Ideally, we could have a combination of both, but it's unclear to me if
> we have good indications that one case is more likely than the other.
> 
> My gut feeling though, is that the guest will be likely to often access
> FPSIMD state for as long as we're in KVM_RUN, and that host userspace
> also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
> (kernel threads etc.) is unlikely to use FPSIMD for a system that is
> primarily running VMs.

I think my suggestion:

 * neither adds nor elides any saves or restores of the guest context;
 * elides some saves and restores of the host context;
 * moves the host context save with respect to your series in those
   cases where it does occur; and
 * adds 1 host context save for each preempt-or-enter-userspace ...
   preempt-or-enter-userspace interval of a vcpu thread during which
   the guest does not use FPSIMD.
  
The last bullet is the only one that can add cost.  I can imagine
hitting this during an I/O emulation storm.  I feel that most of the
rest of the time the change would be a net win, but it's hard to gauge
the overall impact.


Migrating to using the host context switch machinery as-is for
managing the guest FPSIMD context would allow all the redundant
saves/restores would be eliminated.

It would be a more invasive change though, and I don't think this
series should attempt it.

> > > > Yes (except that if a return to userspace happens then FPSIMD will be
> > > > restored at that point: there is no laziness there -- it _could_
> > > > be lazy, but it's deemed unlikely to be a performance win due to the
> > > > fact that the compiler can and does generate FPSIMD code quite
> > > > liberally by default).
> > > > 
> > > > For the case of being preempted within the kernel with no ret_to_user,
> > > > you are correct.
> > > > 
> > > 
> > > ok, that would indeed also be useful for things like switching to a
> > > vhost thread and returning to the vcpu thread.
> > 
> > What's a vhost thread?
> > 
> 
> vhost is the offload mechanism for the data-path of virtio devices, and
> for example if you have an application in your VM which is sending data,
> the VM will typically fill some buffer and then cause a trap to the host
> kernel by writing to an emulated PCI MMIO register, and that trap is
> handled by KVM's IO bus infrastructure, which schedules a vhost kernel
> thread which actually sends the data from the buffer shared between the
> VM and the host kernel onto the physical network.

Ah, right.

Cheers
---Dave
Christoffer Dall Feb. 14, 2018, 10:15 a.m. UTC | #10
On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> > > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > > > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> 
> [...]
> 
> > > Simply entering the kernel and returning to userspace doesn't have
> > > this effect by itself.
> > > 
> > > 
> > > Prior to the SVE patches, KVM makes itself orthogonal to the host
> > > context switch machinery by ensuring that whatever the host had
> > > in the FPSIMD regs at guest entry is restored before returning to
> > > the host. (IIUC)  
> > 
> > Only if the guest actually touches FPSIMD state.  If the guest doesn't
> > touch FPSIMD (no trap to EL2), then we never touch the state at all.
> 
> I should have been clearer: KVM ensures that the state is _unchanged_
> before returning to the host, but can elide the save/restore when the
> guest doesn't touch the state...
> 
> > 
> > > This means that redundant save/restore work is
> > > done by KVM, but does have the advantage of simplicity.
> > 
> > I don't understand what the redundant part here is?  Isn't it only
> > redundant in the case where the host (for some reason) has already saved
> > its FPSIMD state?  I assume that won't be the common case, since
> > "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
> > explained above.
> 
> ...however, when this elision does not occur, it may duplicate
> save/restore done by the kernel, or it may save/restore worthless data
> if the host's FPSIMD state is non-live at the time.
> 
> It's hard to gauge the impact of this: it seems unlikely to make a
> massive difference, but will be highly workload-dependent.

So I thought it might be useful to have some idea of the frequency of
events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu
14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and
some networking benchmarks, and I counted a few events:

 - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE
   system, fewer than 1% of them result in an exit to userspace (0.57%).

 - The VCPU thread was preempted (voluntarily or forced) in the kernel
   less than 3% of the exits (2.72%).  That's just below 5 preemptions
   per ioctl(KVM_RUN).

 - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD
   registers and the host context was restored.

 - We store the host context about 1.38 times per ioctl(KVM_RUN).

So that tells me that (1) it's worth restoring the guest FPSIMD state
lazily as opposed to proactively on vcpu_load, and (2) that there's a
small opportunity for improvement by reducing redundant host vfp state
saves.

> 
> 
> The redundancy occurs because of the deferred restore of the FPSIMD
> registers for host userspace: as a result, the host FPSIMD regs are
> either discardable (i.e., already saved) or not live at all between
> and context switch and the next ret_to_user.
> 
> This means that if the vcpu run loop is preempted, then when the host
> switches back to the run loop it is pointless to save or restore the
> host FPSIMD state.
> 
> A typical sequence of events exposing this redundancy would be as
> follows.  I assume here that there are two cpu-bound tasks A and B
> competing for a host CPU, where A is a vcpu thread:
> 
>  - vcpu A is in the guest running a compute-heavy task
>  - FPSIMD typically traps to the host before context switch
>  X kvm saves the host FPSIMD state
>  - kvm loads the guest FPSIMD state
>  - vcpu A reenters the guest
>  - host context switch IRQ preempts A back to the run loop
>  Y kvm loads the host FPSIMD state via vcpu_put
> 
>  - host context switch:
>  - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
>  - switch to B
>  - B reaches ret_to_user
>  Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
>  - B enters userspace
> 
>  - host context switch:
>  - B enters kernel
>  X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
>  - switch to A -> set TIF_FOREIGN_FPSTATE for A
>  - back to the KVM run loop
> 
>  - vcpu A enters guest
>  - redo from start
> 
> Here, the two saves marked X are redundant with respect to each other,
> and the two restores marked Y are redundant with respect to each other.
> 

Right, ok, but if we have

 - ioctl(KVM_RUN)
 - mark hardware FPSIMD register state as invalid
 - load guest FPSIMD state
 - enter guest
 - exit guest
 - save guest FPSIMD state
 - return to user space

(I.e. we don't do any preemption in the guest)

Then we'll loose the host FPSIMD register state, potentially, right?

Your original comment on this patch was that we didn't need to restore
the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the
scenario above.  The only way I can see this working is by making sure
that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware
register state if the state is live.

Am I still missing something?

> > > This breaks for SVE though: the high bits of the Z-registers will be
> > > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > > This means that if the host has state in those bits then it must
> > > be saved before entring the guest: that's what the new
> > > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> > 
> > Again, I'm confused, because to me it looks like
> > kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> > which just sets a pointer to NULL, but doesn't actually save the state.
> > 
> > So, when is the state in the hardware registers saved to memory?
> 
> This _is_ quite confusing: in writing this answer I identified a bug
> and then realised why there is no bug...
> 
> kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> actually saved today because we explicitly don't care about preserving
> the SVE state, because the syscall ABI throws the SVE regs away as
> a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> 
> I think my proposal is that this hook might take on the role of
> actually saving the state too, if we move that out of the KVM host
> context save/restore code.
> 
> Perhaps we could even replace
> 
> 	preempt_disable();
> 	kvm_fpsimd_flush_cpu_state();
> 	/* ... */
> 	preempt_enable();
> 
> with
> 
> 	kernel_neon_begin();
> 	/* ... */
> 	kernel_neon_end();

I'm not entirely sure where the begin and end points would be in the
context of KVM?

> 
> which does have the host user context saving built in -- though this
> may have unwanted side effects, such as masking softirqs.  Possibly not
> a big deal though if the region is kept small (?)
> 

I'm not sure I fully understand how this would work, so it's hard for me
to comment on.

> 
> <aside>
> 
> Understanding precisely what kvm_fpsimd_flush_cpu_state() does is
> not trivial...  the explanation goes something like this:
> 
> (*takes deep breath*)
> 
> A link is maintained between tasks and CPUs to indicate whether a
> given CPU has the task's FPSIMD state in its regs.
> 
> For brevity, I'll describe this link as a relation loaded(task, cpu).
> 
> 	loaded(current, smp_processor_id()) <->
> 		!test_thread_flag(TIF_FOREIGN_FPSTATE).
> 
> (In effect, TIF_FOREIGN_FPSTATE caches this relation for current.)
> 
> For non-current tasks, the relation is something like
> 
> 	loaded(task, cpu) <->
> 		&task->thread.fpsimd_state ==
> 			per_cpu(fpsimd_last_state.st, cpu) &&
> 		task->thread.fpsimd_state.cpu == cpu.
> 
> There are subtleties about when these equivalences are meaningful
> and how they can be checked safely that I'll gloss over here --
> to get an idea, see cb968afc7898 ("arm64/sve: Avoid dereference of dead
> task_struct in KVM guest entry").
> 
>  * loaded(task, cpu) is made false for all cpus and a given task
>    by fpsimd_flush_task_state(task).
> 
>    This is how we invalidate a stale copy of some task's state when
>    the kernel deliberately changes the state (e.g., exec, sigreturn,
>    PTRACE_SETREGSET).
> 
>  * loaded(task, smp_processor_id()) is made false for all tasks
>    by fpsimd_flush_cpu_state().
> 
>    This is how we avoid using the FPSIMD regs of some CPU that
>    the kernel trashed (e.g., kernel_mode_neon, KVM) as a source
>    of any task's FPSIMD state.
> 
>  * loaded(current, smp_processor_id()) is made true by
>    fpsimd_bind_to_cpu().
> 
>    fpsimd_bind_to_cpu() also implies the effects of
>    fpsimd_flush_task_state(current) and
>    fpsimd_flush_cpu_state(smp_processor_id()) before the new relation is
>    established.  This is not explicit in the code, but falls out from
>    the way the relation is represented.
> 
> 
> ( There is a wrinkle here: fpsimd_flush_task_state(task) should always
> be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
> fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
> garbage left in the SVE bits by KVM's save/restore may spuriously
> appear in the vcpu thread's user regs.  But since that data will be (a)
> zeros or (b) the task's own data; and because TIF_SVE is cleared in
> entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
> think this matters in practice.
> 
> If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
> case too then this becomes significant and we _would_ need to clear
> TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the

clear?  Wouldn't we need to set it?

> vcpu user thread. )
> 
> </aside>
> 

Thanks for this, it's helpful.

What's missing for my understanding is when fpsimd_save_state() gets
called, which must be required in some cases of invalidating the
relation, since otherwise there must be a risk of losing state?

> > > The alternative would have been for KVM to save/restore the host SVE
> > > state directly, but this seemed premature and invasive in the absence
> > > of full KVM SVE support.
> > > 
> > > This means that KVM's own save/restore of the host's FPSIMD state
> > > becomes redundant in this case, but since there is no SVE hardware
> > > yet, I favoured correctness over optimal performance here.
> > > 
> > 
> > I agree with the approach, I guess I just can't seem to follow the code
> > correctly...
> 
> Understandable... even trying to remember how it works is giving me a
> headache #P
> 
> > 
> > > 
> > > My point here was that we could modify this hook to always save off the
> > > host FPSIMD state unconditionally before entering the guts of KVM,
> > > instead of only doing it when there is live SVE state.  The benefit of
> > > this is that the host context switch machinery knows if the state has
> > > already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> > > (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> > > will only save the host FPSIMD (or SVE) state once, and won't restore
> > > it at all (assuming no context switches).
> > > 
> > > Instead, the user thread's FPSIMD state is only reloaded on the final
> > > return to userspace.
> > > 
> > 
> > I think that would invert the logic we have now, so instead of only
> > saving/restoring the FPSIMD state when the guest uses it (as we do now),
> > we would only save/restore the FPSIMD state when the host uses it,
> > regardless of what the guest does.
> 
> I'm not sure that's a complete characterisation of what's going on,
> but I'm struggling to describe my view in simple terms.
> 
> > Ideally, we could have a combination of both, but it's unclear to me if
> > we have good indications that one case is more likely than the other.
> > 
> > My gut feeling though, is that the guest will be likely to often access
> > FPSIMD state for as long as we're in KVM_RUN, and that host userspace
> > also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
> > (kernel threads etc.) is unlikely to use FPSIMD for a system that is
> > primarily running VMs.
> 
> I think my suggestion:
> 
>  * neither adds nor elides any saves or restores of the guest context;
>  * elides some saves and restores of the host context;
>  * moves the host context save with respect to your series in those
>    cases where it does occur; and
>  * adds 1 host context save for each preempt-or-enter-userspace ...
>    preempt-or-enter-userspace interval of a vcpu thread during which
>    the guest does not use FPSIMD.
>   
> The last bullet is the only one that can add cost.  I can imagine
> hitting this during an I/O emulation storm.  I feel that most of the
> rest of the time the change would be a net win, but it's hard to gauge
> the overall impact.
> 

It's certainly possible to have a flow where the guest kernel is not
using FPSIMD and keeps bouncing back to host userspace which does FPSIMD
in memcpy().  This is a pretty likely case for small disk I/O, so I'm
not crazy about this.

> 
> Migrating to using the host context switch machinery as-is for
> managing the guest FPSIMD context would allow all the redundant
> saves/restores would be eliminated.
> 
> It would be a more invasive change though, and I don't think this
> series should attempt it.
> 

I agree that we should attempt to use the host machinery to switch
FPSIMD state for the guest state, as long as we can keep doing that
lazily for the guest state.  Not sure if it belongs in these patches or
not (probably not), but I think it would be helpful if we could write up
a patch to see how that would look.  I don't think any intermediate
optimizations are worth it at this point.

Thanks,
-Christoffer
diff mbox

Patch

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0e9e7291a7e6..9e23bc968668 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -213,6 +213,9 @@  struct kvm_vcpu_arch {
 	/* Guest debug state */
 	u64 debug_flags;
 
+	/* 1 if the guest VFP state is loaded into the hardware */
+	u8 guest_vfp_loaded;
+
 	/*
 	 * We maintain more than a single set of debug registers to support
 	 * debugging the guest from the host and to maintain separate host and
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 612021dce84f..99467327c043 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -133,6 +133,7 @@  int main(void)
   DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
   DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
+  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
   DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
   DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index a360ac6e89e9..53652287a236 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -184,6 +184,9 @@  alternative_endif
 	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
 	bl	__fpsimd_restore_state
 
+	mov	x0, #1
+	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
+
 	// Skip restoring fpexc32 for AArch64 guests
 	mrs	x1, hcr_el2
 	tbnz	x1, #HCR_RW_SHIFT, 1f
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 12dc647a6e5f..29e44a20f5e3 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -24,43 +24,32 @@ 
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 
-static bool __hyp_text __fpsimd_enabled_nvhe(void)
-{
-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
-}
-
-static bool __hyp_text __fpsimd_enabled_vhe(void)
-{
-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
-}
-
-static hyp_alternate_select(__fpsimd_is_enabled,
-			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
-bool __hyp_text __fpsimd_enabled(void)
-{
-	return __fpsimd_is_enabled()();
-}
-
-static void __hyp_text __activate_traps_vhe(void)
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
-	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
+	val &= ~CPACR_EL1_ZEN;
+	if (vcpu->arch.guest_vfp_loaded)
+		val |= CPACR_EL1_FPEN;
+	else
+		val &= ~CPACR_EL1_FPEN;
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(__kvm_hyp_vector, vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	if (vcpu->arch.guest_vfp_loaded)
+		val &= ~CPTR_EL2_TFP;
+	else
+		val |= CPTR_EL2_TFP;
 	write_sysreg(val, cptr_el2);
 }
 
@@ -79,7 +68,8 @@  static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
 	 * it will cause an exception.
 	 */
-	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
+	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
+	    !vcpu->arch.guest_vfp_loaded) {
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
@@ -96,7 +86,7 @@  static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(0, pmselr_el0);
 	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-	__activate_traps_arch()();
+	__activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -284,7 +274,6 @@  int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	vcpu = kern_hyp_va(vcpu);
@@ -376,8 +365,6 @@  int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		/* 0 falls through to be handled out of EL2 */
 	}
 
-	fp_enabled = __fpsimd_enabled();
-
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
@@ -388,11 +375,6 @@  int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_host_state(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
-	}
-
 	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 883a6383cd36..848a46eb33bf 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -138,6 +138,11 @@  void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
 	__sysreg_restore_common_state(ctxt);
 }
 
+static void __hyp_text __fpsimd32_save_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt->sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
+}
+
 void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
 	u64 *spsr, *sysreg;
@@ -156,9 +161,6 @@  void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
 	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-	if (__fpsimd_enabled())
-		sysreg[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
-
 	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
@@ -213,6 +215,19 @@  void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
  */
 void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
+
+	/* Restore host FP/SIMD state */
+	if (vcpu->arch.guest_vfp_loaded) {
+		if (vcpu_el1_is_32bit(vcpu)) {
+			kvm_call_hyp(__fpsimd32_save_state,
+				     kern_hyp_va(guest_ctxt));
+		}
+		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
+		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
+		vcpu->arch.guest_vfp_loaded = 0;
+	}
 }
 
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)