mbox series

[00/11] KVM: arm64: nv: FPSIMD/SVE support

Message ID 20240531231358.1000039-1-oliver.upton@linux.dev (mailing list archive)
Headers show
Series KVM: arm64: nv: FPSIMD/SVE support | expand

Message

Oliver Upton May 31, 2024, 11:13 p.m. UTC
Hey!

I've decided to start messing around with nested and have SVE support
working for a nested guest. For the sake of landing a semi-complete
feature upstream, I've also picked up the FPSIMD patches from the NV
series Marc is carrying.

The most annoying part about this series (IMO) is that ZCR_EL2 traps
behave differently from what needs to be virtualized for the guest when
HCR_EL2.NV = 1, as it takes a sysreg trap (EC = 0x18) instead of an SVE
trap (EC = 0x19). So, we need to synthesize the ESR value when
reflecting back into the guest hypervisor.

Otherwise, some care is required to slap the guest hypervisor's ZCR_EL2
into the right place depending on whether or not the vCPU is in a hyp
context, since it affects the hyp's usage of SVE in addition to the VM.

There's more work to be done for honoring the L1's CPTR traps, as this
series only focuses on getting SVE and FPSIMD traps right. We'll get
there one day.

I tested this using a mix of the fpsimd-test and sve-test selftests
running at L0, L1, and L2 concurrently on Neoverse V2.

Jintack Lim (1):
  KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor

Oliver Upton (10):
  KVM: arm64: nv: Forward SVE traps to guest hypervisor
  KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap
  KVM: arm64: nv: Load guest hyp's ZCR into EL1 state
  KVM: arm64: nv: Handle ZCR_EL2 traps
  KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context
  KVM: arm64: nv: Use guest hypervisor's max VL when running nested
    guest
  KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state
  KVM: arm64: Spin off helper for programming CPTR traps
  KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2
  KVM: arm64: Allow the use of SVE+NV

 arch/arm64/include/asm/kvm_emulate.h    | 47 +++++++++++++++++++
 arch/arm64/include/asm/kvm_host.h       |  7 +++
 arch/arm64/include/asm/kvm_nested.h     |  1 -
 arch/arm64/kvm/arm.c                    |  5 --
 arch/arm64/kvm/fpsimd.c                 | 22 +++++++--
 arch/arm64/kvm/handle_exit.c            | 19 ++++++--
 arch/arm64/kvm/hyp/include/hyp/switch.h | 43 ++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/switch.c         | 62 +++++++++++++++----------
 arch/arm64/kvm/nested.c                 |  3 +-
 arch/arm64/kvm/sys_regs.c               | 40 ++++++++++++++++
 10 files changed, 206 insertions(+), 43 deletions(-)


base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0

Comments

Marc Zyngier June 1, 2024, 10:24 a.m. UTC | #1
On Sat, 01 Jun 2024 00:13:47 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hey!
> 
> I've decided to start messing around with nested and have SVE support
> working for a nested guest. For the sake of landing a semi-complete
> feature upstream, I've also picked up the FPSIMD patches from the NV
> series Marc is carrying.
> 
> The most annoying part about this series (IMO) is that ZCR_EL2 traps
> behave differently from what needs to be virtualized for the guest when
> HCR_EL2.NV = 1, as it takes a sysreg trap (EC = 0x18) instead of an SVE
> trap (EC = 0x19). So, we need to synthesize the ESR value when
> reflecting back into the guest hypervisor.

That's unfortunately not a unique case. The ERETAx emulation already
requires us to synthesise the ESR on PAC check failure, and I'm afraid
ZCR_EL2 might not be the last case.

In general, we'll see this problem for any instruction or sysreg that
can generate multiple exception classes.

>
> Otherwise, some care is required to slap the guest hypervisor's ZCR_EL2
> into the right place depending on whether or not the vCPU is in a hyp
> context, since it affects the hyp's usage of SVE in addition to the VM.
> 
> There's more work to be done for honoring the L1's CPTR traps, as this
> series only focuses on getting SVE and FPSIMD traps right. We'll get
> there one day.

I have patches for that in my NV series, which would take the place of
patches 9 and 10 in your series (or supplement them, depending on how
we want to slice this).

> 
> I tested this using a mix of the fpsimd-test and sve-test selftests
> running at L0, L1, and L2 concurrently on Neoverse V2.

Thanks a lot for tackling this. It'd be good to put together a series
that has the EL2 sysreg save/restore patches as a prefix of this, plus
the CPTR_EL2 changes. That way, we'd have something that can be merged
as a consistent set.

I'll try to take this into my branch and see what explodes!

Cheers,

	M.
Oliver Upton June 1, 2024, 4:57 p.m. UTC | #2
On Sat, Jun 01, 2024 at 11:24:49AM +0100, Marc Zyngier wrote:
> On Sat, 01 Jun 2024 00:13:47 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > Hey!
> > 
> > I've decided to start messing around with nested and have SVE support
> > working for a nested guest. For the sake of landing a semi-complete
> > feature upstream, I've also picked up the FPSIMD patches from the NV
> > series Marc is carrying.
> > 
> > The most annoying part about this series (IMO) is that ZCR_EL2 traps
> > behave differently from what needs to be virtualized for the guest when
> > HCR_EL2.NV = 1, as it takes a sysreg trap (EC = 0x18) instead of an SVE
> > trap (EC = 0x19). So, we need to synthesize the ESR value when
> > reflecting back into the guest hypervisor.
> 
> That's unfortunately not a unique case. The ERETAx emulation already
> requires us to synthesise the ESR on PAC check failure, and I'm afraid
> ZCR_EL2 might not be the last case.
> 
> In general, we'll see this problem for any instruction or sysreg that
> can generate multiple exception classes.

Right, I didn't have a good feel yet for whether or not we could add
some generalized infrastructure for 'remapping' ESR values for the guest
hypervisor. Of course, not needed for this, but cooking up an ISS is
likely to require a bit of manual intervention.

> > Otherwise, some care is required to slap the guest hypervisor's ZCR_EL2
> > into the right place depending on whether or not the vCPU is in a hyp
> > context, since it affects the hyp's usage of SVE in addition to the VM.
> > 
> > There's more work to be done for honoring the L1's CPTR traps, as this
> > series only focuses on getting SVE and FPSIMD traps right. We'll get
> > there one day.
> 
> I have patches for that in my NV series, which would take the place of
> patches 9 and 10 in your series (or supplement them, depending on how
> we want to slice this).

That'd be great, I just wanted to post something focused on FP/SVE to
start but...

> > 
> > I tested this using a mix of the fpsimd-test and sve-test selftests
> > running at L0, L1, and L2 concurrently on Neoverse V2.
> 
> Thanks a lot for tackling this. It'd be good to put together a series
> that has the EL2 sysreg save/restore patches as a prefix of this, plus
> the CPTR_EL2 changes. That way, we'd have something that can be merged
> as a consistent set.

I'd be happy to stitch together something like this to round out the
feature. I deliberately left out the handling of vEL2 registers because
of the CPACR_EL1 v. CPTR_EL2 mess, but we may as well sort that out.

Did you want to post your CPTR bits when you have a chance?
Marc Zyngier June 2, 2024, 2:28 p.m. UTC | #3
On Sat, 01 Jun 2024 17:57:31 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Sat, Jun 01, 2024 at 11:24:49AM +0100, Marc Zyngier wrote:
> > On Sat, 01 Jun 2024 00:13:47 +0100,
> > Oliver Upton <oliver.upton@linux.dev> wrote:
> > > 
> > > Hey!
> > > 
> > > I've decided to start messing around with nested and have SVE support
> > > working for a nested guest. For the sake of landing a semi-complete
> > > feature upstream, I've also picked up the FPSIMD patches from the NV
> > > series Marc is carrying.
> > > 
> > > The most annoying part about this series (IMO) is that ZCR_EL2 traps
> > > behave differently from what needs to be virtualized for the guest when
> > > HCR_EL2.NV = 1, as it takes a sysreg trap (EC = 0x18) instead of an SVE
> > > trap (EC = 0x19). So, we need to synthesize the ESR value when
> > > reflecting back into the guest hypervisor.
> > 
> > That's unfortunately not a unique case. The ERETAx emulation already
> > requires us to synthesise the ESR on PAC check failure, and I'm afraid
> > ZCR_EL2 might not be the last case.
> > 
> > In general, we'll see this problem for any instruction or sysreg that
> > can generate multiple exception classes.
> 
> Right, I didn't have a good feel yet for whether or not we could add
> some generalized infrastructure for 'remapping' ESR values for the guest
> hypervisor. Of course, not needed for this, but cooking up an ISS is
> likely to require a bit of manual intervention.

So far, it is pretty limited, only takes a couple of lines of code,
and is likely to always be coupled with some more complicated handling
(I don't see this being *only* a quick ESR remapping).

> > > Otherwise, some care is required to slap the guest hypervisor's ZCR_EL2
> > > into the right place depending on whether or not the vCPU is in a hyp
> > > context, since it affects the hyp's usage of SVE in addition to the VM.
> > > 
> > > There's more work to be done for honoring the L1's CPTR traps, as this
> > > series only focuses on getting SVE and FPSIMD traps right. We'll get
> > > there one day.
> > 
> > I have patches for that in my NV series, which would take the place of
> > patches 9 and 10 in your series (or supplement them, depending on how
> > we want to slice this).
> 
> That'd be great, I just wanted to post something focused on FP/SVE to
> start but...
> 
> > > 
> > > I tested this using a mix of the fpsimd-test and sve-test selftests
> > > running at L0, L1, and L2 concurrently on Neoverse V2.
> > 
> > Thanks a lot for tackling this. It'd be good to put together a series
> > that has the EL2 sysreg save/restore patches as a prefix of this, plus
> > the CPTR_EL2 changes. That way, we'd have something that can be merged
> > as a consistent set.
> 
> I'd be happy to stitch together something like this to round out the
> feature. I deliberately left out the handling of vEL2 registers because
> of the CPACR_EL1 v. CPTR_EL2 mess, but we may as well sort that out.
> 
> Did you want to post your CPTR bits when you have a chance?

Yup, I'll rework that on top of your series and we'll take it from
there.

Thanks,

	M.