mbox series

[v4,0/4] KVM: arm64: Fix underallocation of storage for SVE state

Message ID 20240704-kvm-arm64-fix-pkvm-sve-vl-v4-0-b6898ab23dc4@kernel.org (mailing list archive)
Headers show
Series KVM: arm64: Fix underallocation of storage for SVE state | expand

Message

Mark Brown July 4, 2024, 5:28 p.m. UTC
As observed during review the pKVM support for saving host SVE state is
broken if an asymmetric system has VLs larger than the maximum shared
VL, fix this by discovering then using the maximum VL for allocations
and using RDVL during the save/restore process.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
Changes in v4:
- Roll in Catalin's acks.
- Link to v3: https://lore.kernel.org/r/20240607-kvm-arm64-fix-pkvm-sve-vl-v3-0-59e781706d65@kernel.org

Changes in v3:
- Replace %u with %lu in late CPU error message.
Changes in v2:
- Downgrade check for a late CPU increasing maximum VL to a warning only
  but do it unconditionally since pKVM prevents late CPUs anyway.
- Commit log tweaks.
- Link to v1: https://lore.kernel.org/r/20240605-kvm-arm64-fix-pkvm-sve-vl-v1-0-680d6b43b4c1@kernel.org

---
Mark Brown (4):
      arm64/fpsimd: Introduce __bit_to_vl() helper
      arm64/fpsimd: Discover maximum vector length implemented by any CPU
      KVM: arm64: Fix FFR offset calculation for pKVM host state save and restore
      KVM: arm64: Avoid underallocating storage for host SVE state

 arch/arm64/include/asm/fpsimd.h         | 17 +++++++++++++++
 arch/arm64/include/asm/kvm_host.h       |  2 +-
 arch/arm64/include/asm/kvm_hyp.h        |  3 ++-
 arch/arm64/include/asm/kvm_pkvm.h       |  2 +-
 arch/arm64/kernel/fpsimd.c              | 38 +++++++++++++++++++++++++++------
 arch/arm64/kvm/hyp/fpsimd.S             |  5 +++++
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  6 +++---
 arch/arm64/kvm/hyp/nvhe/pkvm.c          |  2 +-
 arch/arm64/kvm/reset.c                  |  6 +++---
 10 files changed, 65 insertions(+), 18 deletions(-)
---
base-commit: afb91f5f8ad7af172d993a34fde1947892408f53
change-id: 20240604-kvm-arm64-fix-pkvm-sve-vl-13cd71fd7db0

Best regards,

Comments

Marc Zyngier July 5, 2024, 1:20 p.m. UTC | #1
On Thu, 04 Jul 2024 18:28:15 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> As observed during review the pKVM support for saving host SVE state is
> broken if an asymmetric system has VLs larger than the maximum shared
> VL, fix this by discovering then using the maximum VL for allocations
> and using RDVL during the save/restore process.

I really don't see why we need such complexity here.

Fuad did post something[1] that did the trick with a far less invasive
change, and it really feels like we are putting the complexity at the
wrong place.

So what's wrong with that approach? I get that you want to shout about
secondary CPUs, but that's an orthogonal problem.

	M.

[1] https://lore.kernel.org/r/20240606092623.2236172-1-tabba@google.com
Mark Brown July 5, 2024, 5:18 p.m. UTC | #2
On Fri, Jul 05, 2024 at 02:20:05PM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > As observed during review the pKVM support for saving host SVE state is
> > broken if an asymmetric system has VLs larger than the maximum shared
> > VL, fix this by discovering then using the maximum VL for allocations
> > and using RDVL during the save/restore process.

> I really don't see why we need such complexity here.

> Fuad did post something[1] that did the trick with a far less invasive
> change, and it really feels like we are putting the complexity at the
> wrong place.

> So what's wrong with that approach? I get that you want to shout about
> secondary CPUs, but that's an orthogonal problem.

As I've said from a clarity/fragility point of view I'm not happy with
configuring the vector length to one value then immediately doing things
that assume another value, even if everything is actually lined up
in a way that works.  Having uncommented code where you have to go and
check correctness when you see it isn't great, seeing an inconsistency
just raises alarm bells.  It is much clearer to write the code in a way
that makes it obvious that the VL we are using is the one the hardware
is using, for the host save/restore reading the actual VL back seemed
like the most straightforward way to do that.

A similar thing applies with the enumeration code - like I said in reply
to one of Fuad's postings I originally wrote something that's basically
the same as the patch Faud posted but because it is not consistent with
the surrounding code in how it approaches things it was just raising
questions about if the new code was missing something, or if there was
some problem that should be addressed in the existing code.  Rather than
write an extensive changelog and/or comments covering these
considerations it seemed more better to just write the code in a
consistent manner so the questions aren't prompted.  Keeping the
approach consistent is a bit more code right now but makes the result
much easier to reason about.

The late CPUs thing is really just an answer to the initial "why is this
different, what might we have missed?" question rather than a particular
goal itself.  Adding a warning is as much about documenting the handling
of late CPUs as it is about highlighting any unfortunate implementation
choices we run into.

Basically it's maintainability concerns, especially with the enumeration
code.
Dave Martin July 8, 2024, 3:30 p.m. UTC | #3
Hi all,

On Fri, Jul 05, 2024 at 06:18:50PM +0100, Mark Brown wrote:
> On Fri, Jul 05, 2024 at 02:20:05PM +0100, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> 
> > > As observed during review the pKVM support for saving host SVE state is
> > > broken if an asymmetric system has VLs larger than the maximum shared
> > > VL, fix this by discovering then using the maximum VL for allocations
> > > and using RDVL during the save/restore process.
> 
> > I really don't see why we need such complexity here.

The first patch is orthogonal cleanup, and the rest doesn't really add
complexity IIUC.

> > Fuad did post something[1] that did the trick with a far less invasive
> > change, and it really feels like we are putting the complexity at the
> > wrong place.
> 
> > So what's wrong with that approach? I get that you want to shout about
> > secondary CPUs, but that's an orthogonal problem.
> 
> As I've said from a clarity/fragility point of view I'm not happy with
> configuring the vector length to one value then immediately doing things
> that assume another value, even if everything is actually lined up
> in a way that works.  Having uncommented code where you have to go and
> check correctness when you see it isn't great, seeing an inconsistency
> just raises alarm bells.  It is much clearer to write the code in a way
> that makes it obvious that the VL we are using is the one the hardware
> is using, for the host save/restore reading the actual VL back seemed
> like the most straightforward way to do that.
> 
> A similar thing applies with the enumeration code - like I said in reply
> to one of Fuad's postings I originally wrote something that's basically
> the same as the patch Faud posted but because it is not consistent with
> the surrounding code in how it approaches things it was just raising
> questions about if the new code was missing something, or if there was
> some problem that should be addressed in the existing code.  Rather than
> write an extensive changelog and/or comments covering these
> considerations it seemed more better to just write the code in a
> consistent manner so the questions aren't prompted.  Keeping the
> approach consistent is a bit more code right now but makes the result
> much easier to reason about.
> 
> The late CPUs thing is really just an answer to the initial "why is this
> different, what might we have missed?" question rather than a particular
> goal itself.  Adding a warning is as much about documenting the handling
> of late CPUs as it is about highlighting any unfortunate implementation
> choices we run into.
> 
> Basically it's maintainability concerns, especially with the enumeration
> code.

I tend to agree here.

It's probably best to stick to one convention everywhere about how the
SVE regs are laid out for a given VL.  There's nothing wrong with Fuad's
fixed sve_ffr_offset(), but it's different from the VL-dependent offset
already used elsewhere and so risks causing confusion further down the
line.


One thing confuses me:

The host could never use over-max VLs except in non-preemptible kernel
code, since code doing that would be non-migratable to other physical
CPUs.  This is done to probe SVE only, and the extra bits in the vector
registers are never used at all.

Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2,
just as regular KVM does for the guest?

(I may be making bad assumptions about pKVM's relationship with the host
kernel.)

Cheers
---Dave
Mark Brown July 8, 2024, 6:12 p.m. UTC | #4
On Mon, Jul 08, 2024 at 04:30:30PM +0100, Dave Martin wrote:

> One thing confuses me:

> The host could never use over-max VLs except in non-preemptible kernel
> code, since code doing that would be non-migratable to other physical
> CPUs.  This is done to probe SVE only, and the extra bits in the vector
> registers are never used at all.

> Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2,
> just as regular KVM does for the guest?

> (I may be making bad assumptions about pKVM's relationship with the host
> kernel.)

That'd be another way to do it, constrain the VLs the host can set - I
assume there's something about how pKVM or hVHE does things that makes
problems for that.
Mark Brown Sept. 4, 2024, 3:48 p.m. UTC | #5
On Mon, Jul 08, 2024 at 04:30:30PM +0100, Dave Martin wrote:
> On Fri, Jul 05, 2024 at 06:18:50PM +0100, Mark Brown wrote:
> > On Fri, Jul 05, 2024 at 02:20:05PM +0100, Marc Zyngier wrote:
> > > Mark Brown <broonie@kernel.org> wrote:

> > > > As observed during review the pKVM support for saving host SVE state is
> > > > broken if an asymmetric system has VLs larger than the maximum shared
> > > > VL, fix this by discovering then using the maximum VL for allocations
> > > > and using RDVL during the save/restore process.

> > > I really don't see why we need such complexity here.

> The first patch is orthogonal cleanup, and the rest doesn't really add
> complexity IIUC.

...

> > Basically it's maintainability concerns, especially with the enumeration
> > code.

> I tend to agree here.

Did anyone have any further thoughts on this?  It's been roughly a
release cycle since I originally posted this, and there's been no
changes requested since -rc1 (which was itself just a rebase).

> The host could never use over-max VLs except in non-preemptible kernel
> code, since code doing that would be non-migratable to other physical
> CPUs.  This is done to probe SVE only, and the extra bits in the vector
> registers are never used at all.

> Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2,
> just as regular KVM does for the guest?

> (I may be making bad assumptions about pKVM's relationship with the host
> kernel.)

That's one for the pKVM people.
Fuad Tabba Sept. 6, 2024, 3:35 p.m. UTC | #6
Hi,

> > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2,
> > just as regular KVM does for the guest?
>
> > (I may be making bad assumptions about pKVM's relationship with the host
> > kernel.)
>
> That's one for the pKVM people.

Yes, but that's not really the issue here, unless I'm missing
something else. The issue is that pKVM needs to store the host's SVE
state too, to be able to restore it. So hiding non-symmetrically
supported VLs for the guests shouldn't be relevant.

Cheers,
/fuad
Mark Brown Sept. 6, 2024, 4:10 p.m. UTC | #7
On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote:

> > > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2,
> > > just as regular KVM does for the guest?

> > > (I may be making bad assumptions about pKVM's relationship with the host
> > > kernel.)

> > That's one for the pKVM people.

> Yes, but that's not really the issue here, unless I'm missing
> something else. The issue is that pKVM needs to store the host's SVE
> state too, to be able to restore it. So hiding non-symmetrically
> supported VLs for the guests shouldn't be relevant.

If the host kernel is also running at EL1 and it's only the hypervisor
running at EL2 then the hypervisor can control the VLs that the host
kernel is able to access?
Fuad Tabba Sept. 6, 2024, 4:14 p.m. UTC | #8
On Fri, 6 Sept 2024 at 17:10, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote:
>
> > > > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2,
> > > > just as regular KVM does for the guest?
>
> > > > (I may be making bad assumptions about pKVM's relationship with the host
> > > > kernel.)
>
> > > That's one for the pKVM people.
>
> > Yes, but that's not really the issue here, unless I'm missing
> > something else. The issue is that pKVM needs to store the host's SVE
> > state too, to be able to restore it. So hiding non-symmetrically
> > supported VLs for the guests shouldn't be relevant.
>
> If the host kernel is also running at EL1 and it's only the hypervisor
> running at EL2 then the hypervisor can control the VLs that the host
> kernel is able to access?

Yes it can. But do we want to impose limits on host VLs when running
pKVM that might not exist without pKVM?

Although AFAIK, such hardware doesn't exist in practice, the reason we
went down this rabbit hole from the beginning was to ensure that we
wouldn't run into problems if it were to happen.

Thanks,
/fuad
Mark Brown Sept. 6, 2024, 6:02 p.m. UTC | #9
On Fri, Sep 06, 2024 at 05:14:09PM +0100, Fuad Tabba wrote:
> On Fri, 6 Sept 2024 at 17:10, Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote:

> > > Yes, but that's not really the issue here, unless I'm missing
> > > something else. The issue is that pKVM needs to store the host's SVE
> > > state too, to be able to restore it. So hiding non-symmetrically
> > > supported VLs for the guests shouldn't be relevant.

> > If the host kernel is also running at EL1 and it's only the hypervisor
> > running at EL2 then the hypervisor can control the VLs that the host
> > kernel is able to access?

> Yes it can. But do we want to impose limits on host VLs when running
> pKVM that might not exist without pKVM?

I mean, at the minute the host kernel will just not configure any
non-shared VLs so pKVM isn't making a difference anyway.  Even when we
add kernel mode SVE usage kernel mode FP is preemptable and schedulable
so we'd not likely want to use non-shared VLs there either.  If someone
ever felt moved to add support for using any non-shared VLs the most
likely usage would be for userspace and we'd constrain any tasks using
SVE to the cores that support their VLs similar to how we handle CPUs
with no 32 bit support.  Hopefully the scheduler would cope well with
that.

> Although AFAIK, such hardware doesn't exist in practice, the reason we
> went down this rabbit hole from the beginning was to ensure that we
> wouldn't run into problems if it were to happen.

Yes, it's not an issue with any presently known hardware - the issue is
leaving nasty surprises should someone build it rather than anything
immediately practical.  Ideally they won't.

My general feeling is that it would have been perfectly fine for pKVM to
enforce what the host kernel wants to do anyway, if we ever do add
support for using asymmetric VLs and care about doing so on a system
running pKVM then dealing with pKVM imposed limits at that time seems
more than reasonable.  It probably wouldn't be the largest part of the
work.  Equally we now have the code so we may as well use it?  It's not
imposing huge overheads.
Fuad Tabba Sept. 10, 2024, 12:49 p.m. UTC | #10
Hi,

On Fri, 6 Sept 2024 at 19:03, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Sep 06, 2024 at 05:14:09PM +0100, Fuad Tabba wrote:
> > On Fri, 6 Sept 2024 at 17:10, Mark Brown <broonie@kernel.org> wrote:
> > > On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote:
>
> > > > Yes, but that's not really the issue here, unless I'm missing
> > > > something else. The issue is that pKVM needs to store the host's SVE
> > > > state too, to be able to restore it. So hiding non-symmetrically
> > > > supported VLs for the guests shouldn't be relevant.
>
> > > If the host kernel is also running at EL1 and it's only the hypervisor
> > > running at EL2 then the hypervisor can control the VLs that the host
> > > kernel is able to access?
>
> > Yes it can. But do we want to impose limits on host VLs when running
> > pKVM that might not exist without pKVM?
>
> I mean, at the minute the host kernel will just not configure any
> non-shared VLs so pKVM isn't making a difference anyway.  Even when we
> add kernel mode SVE usage kernel mode FP is preemptable and schedulable
> so we'd not likely want to use non-shared VLs there either.  If someone
> ever felt moved to add support for using any non-shared VLs the most
> likely usage would be for userspace and we'd constrain any tasks using
> SVE to the cores that support their VLs similar to how we handle CPUs
> with no 32 bit support.  Hopefully the scheduler would cope well with
> that.
>
> > Although AFAIK, such hardware doesn't exist in practice, the reason we
> > went down this rabbit hole from the beginning was to ensure that we
> > wouldn't run into problems if it were to happen.
>
> Yes, it's not an issue with any presently known hardware - the issue is
> leaving nasty surprises should someone build it rather than anything
> immediately practical.  Ideally they won't.
>
> My general feeling is that it would have been perfectly fine for pKVM to
> enforce what the host kernel wants to do anyway, if we ever do add
> support for using asymmetric VLs and care about doing so on a system
> running pKVM then dealing with pKVM imposed limits at that time seems
> more than reasonable.  It probably wouldn't be the largest part of the
> work.  Equally we now have the code so we may as well use it?  It's not
> imposing huge overheads.

From pKVM, this would work and other than the potential for future
support for using asymmetric VLs, I don't really see a problem. Much
of the complexity was an attempt not to make pKVM more restrictive
than other modes.

Cheers,
/fuad
Mark Brown Sept. 10, 2024, 2:19 p.m. UTC | #11
On Tue, Sep 10, 2024 at 01:49:37PM +0100, Fuad Tabba wrote:
> On Fri, 6 Sept 2024 at 19:03, Mark Brown <broonie@kernel.org> wrote:

> > My general feeling is that it would have been perfectly fine for pKVM to
> > enforce what the host kernel wants to do anyway, if we ever do add
> > support for using asymmetric VLs and care about doing so on a system
> > running pKVM then dealing with pKVM imposed limits at that time seems
> > more than reasonable.  It probably wouldn't be the largest part of the
> > work.  Equally we now have the code so we may as well use it?  It's not
> > imposing huge overheads.

> From pKVM, this would work and other than the potential for future
> support for using asymmetric VLs, I don't really see a problem. Much
> of the complexity was an attempt not to make pKVM more restrictive
> than other modes.

Right, so it's just a question of if we want to use the code that
doesn't impose the limit given that we have it already.  I'm throwing a
patch that limits the host VL into CI, should be out today.