Message ID | 20240704-kvm-arm64-fix-pkvm-sve-vl-v4-0-b6898ab23dc4@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: Fix underallocation of storage for SVE state | expand |
On Thu, 04 Jul 2024 18:28:15 +0100, Mark Brown <broonie@kernel.org> wrote: > > As observed during review the pKVM support for saving host SVE state is > broken if an asymmetric system has VLs larger than the maximum shared > VL, fix this by discovering then using the maximum VL for allocations > and using RDVL during the save/restore process. I really don't see why we need such complexity here. Fuad did post something[1] that did the trick with a far less invasive change, and it really feels like we are putting the complexity at the wrong place. So what's wrong with that approach? I get that you want to shout about secondary CPUs, but that's an orthogonal problem. M. [1] https://lore.kernel.org/r/20240606092623.2236172-1-tabba@google.com
On Fri, Jul 05, 2024 at 02:20:05PM +0100, Marc Zyngier wrote: > Mark Brown <broonie@kernel.org> wrote: > > As observed during review the pKVM support for saving host SVE state is > > broken if an asymmetric system has VLs larger than the maximum shared > > VL, fix this by discovering then using the maximum VL for allocations > > and using RDVL during the save/restore process. > I really don't see why we need such complexity here. > Fuad did post something[1] that did the trick with a far less invasive > change, and it really feels like we are putting the complexity at the > wrong place. > So what's wrong with that approach? I get that you want to shout about > secondary CPUs, but that's an orthogonal problem. As I've said from a clarity/fragility point of view I'm not happy with configuring the vector length to one value then immediately doing things that assume another value, even if everything is actually lined up in a way that works. Having uncommented code where you have to go and check correctness when you see it isn't great, seeing an inconsistency just raises alarm bells. It is much clearer to write the code in a way that makes it obvious that the VL we are using is the one the hardware is using, for the host save/restore reading the actual VL back seemed like the most straightforward way to do that. A similar thing applies with the enumeration code - like I said in reply to one of Fuad's postings I originally wrote something that's basically the same as the patch Faud posted but because it is not consistent with the surrounding code in how it approaches things it was just raising questions about if the new code was missing something, or if there was some problem that should be addressed in the existing code. Rather than write an extensive changelog and/or comments covering these considerations it seemed more better to just write the code in a consistent manner so the questions aren't prompted. Keeping the approach consistent is a bit more code right now but makes the result much easier to reason about. The late CPUs thing is really just an answer to the initial "why is this different, what might we have missed?" question rather than a particular goal itself. Adding a warning is as much about documenting the handling of late CPUs as it is about highlighting any unfortunate implementation choices we run into. Basically it's maintainability concerns, especially with the enumeration code.
Hi all, On Fri, Jul 05, 2024 at 06:18:50PM +0100, Mark Brown wrote: > On Fri, Jul 05, 2024 at 02:20:05PM +0100, Marc Zyngier wrote: > > Mark Brown <broonie@kernel.org> wrote: > > > > As observed during review the pKVM support for saving host SVE state is > > > broken if an asymmetric system has VLs larger than the maximum shared > > > VL, fix this by discovering then using the maximum VL for allocations > > > and using RDVL during the save/restore process. > > > I really don't see why we need such complexity here. The first patch is orthogonal cleanup, and the rest doesn't really add complexity IIUC. > > Fuad did post something[1] that did the trick with a far less invasive > > change, and it really feels like we are putting the complexity at the > > wrong place. > > > So what's wrong with that approach? I get that you want to shout about > > secondary CPUs, but that's an orthogonal problem. > > As I've said from a clarity/fragility point of view I'm not happy with > configuring the vector length to one value then immediately doing things > that assume another value, even if everything is actually lined up > in a way that works. Having uncommented code where you have to go and > check correctness when you see it isn't great, seeing an inconsistency > just raises alarm bells. It is much clearer to write the code in a way > that makes it obvious that the VL we are using is the one the hardware > is using, for the host save/restore reading the actual VL back seemed > like the most straightforward way to do that. > > A similar thing applies with the enumeration code - like I said in reply > to one of Fuad's postings I originally wrote something that's basically > the same as the patch Faud posted but because it is not consistent with > the surrounding code in how it approaches things it was just raising > questions about if the new code was missing something, or if there was > some problem that should be addressed in the existing code. Rather than > write an extensive changelog and/or comments covering these > considerations it seemed more better to just write the code in a > consistent manner so the questions aren't prompted. Keeping the > approach consistent is a bit more code right now but makes the result > much easier to reason about. > > The late CPUs thing is really just an answer to the initial "why is this > different, what might we have missed?" question rather than a particular > goal itself. Adding a warning is as much about documenting the handling > of late CPUs as it is about highlighting any unfortunate implementation > choices we run into. > > Basically it's maintainability concerns, especially with the enumeration > code. I tend to agree here. It's probably best to stick to one convention everywhere about how the SVE regs are laid out for a given VL. There's nothing wrong with Fuad's fixed sve_ffr_offset(), but it's different from the VL-dependent offset already used elsewhere and so risks causing confusion further down the line. One thing confuses me: The host could never use over-max VLs except in non-preemptible kernel code, since code doing that would be non-migratable to other physical CPUs. This is done to probe SVE only, and the extra bits in the vector registers are never used at all. Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2, just as regular KVM does for the guest? (I may be making bad assumptions about pKVM's relationship with the host kernel.) Cheers ---Dave
On Mon, Jul 08, 2024 at 04:30:30PM +0100, Dave Martin wrote: > One thing confuses me: > The host could never use over-max VLs except in non-preemptible kernel > code, since code doing that would be non-migratable to other physical > CPUs. This is done to probe SVE only, and the extra bits in the vector > registers are never used at all. > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2, > just as regular KVM does for the guest? > (I may be making bad assumptions about pKVM's relationship with the host > kernel.) That'd be another way to do it, constrain the VLs the host can set - I assume there's something about how pKVM or hVHE does things that makes problems for that.
On Mon, Jul 08, 2024 at 04:30:30PM +0100, Dave Martin wrote: > On Fri, Jul 05, 2024 at 06:18:50PM +0100, Mark Brown wrote: > > On Fri, Jul 05, 2024 at 02:20:05PM +0100, Marc Zyngier wrote: > > > Mark Brown <broonie@kernel.org> wrote: > > > > As observed during review the pKVM support for saving host SVE state is > > > > broken if an asymmetric system has VLs larger than the maximum shared > > > > VL, fix this by discovering then using the maximum VL for allocations > > > > and using RDVL during the save/restore process. > > > I really don't see why we need such complexity here. > The first patch is orthogonal cleanup, and the rest doesn't really add > complexity IIUC. ... > > Basically it's maintainability concerns, especially with the enumeration > > code. > I tend to agree here. Did anyone have any further thoughts on this? It's been roughly a release cycle since I originally posted this, and there's been no changes requested since -rc1 (which was itself just a rebase). > The host could never use over-max VLs except in non-preemptible kernel > code, since code doing that would be non-migratable to other physical > CPUs. This is done to probe SVE only, and the extra bits in the vector > registers are never used at all. > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2, > just as regular KVM does for the guest? > (I may be making bad assumptions about pKVM's relationship with the host > kernel.) That's one for the pKVM people.
Hi, > > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2, > > just as regular KVM does for the guest? > > > (I may be making bad assumptions about pKVM's relationship with the host > > kernel.) > > That's one for the pKVM people. Yes, but that's not really the issue here, unless I'm missing something else. The issue is that pKVM needs to store the host's SVE state too, to be able to restore it. So hiding non-symmetrically supported VLs for the guests shouldn't be relevant. Cheers, /fuad
On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote: > > > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2, > > > just as regular KVM does for the guest? > > > (I may be making bad assumptions about pKVM's relationship with the host > > > kernel.) > > That's one for the pKVM people. > Yes, but that's not really the issue here, unless I'm missing > something else. The issue is that pKVM needs to store the host's SVE > state too, to be able to restore it. So hiding non-symmetrically > supported VLs for the guests shouldn't be relevant. If the host kernel is also running at EL1 and it's only the hypervisor running at EL2 then the hypervisor can control the VLs that the host kernel is able to access?
On Fri, 6 Sept 2024 at 17:10, Mark Brown <broonie@kernel.org> wrote: > > On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote: > > > > > Can't pKVM just hide the non symmetrically supported VLs using ZCR_EL2, > > > > just as regular KVM does for the guest? > > > > > (I may be making bad assumptions about pKVM's relationship with the host > > > > kernel.) > > > > That's one for the pKVM people. > > > Yes, but that's not really the issue here, unless I'm missing > > something else. The issue is that pKVM needs to store the host's SVE > > state too, to be able to restore it. So hiding non-symmetrically > > supported VLs for the guests shouldn't be relevant. > > If the host kernel is also running at EL1 and it's only the hypervisor > running at EL2 then the hypervisor can control the VLs that the host > kernel is able to access? Yes it can. But do we want to impose limits on host VLs when running pKVM that might not exist without pKVM? Although AFAIK, such hardware doesn't exist in practice, the reason we went down this rabbit hole from the beginning was to ensure that we wouldn't run into problems if it were to happen. Thanks, /fuad
On Fri, Sep 06, 2024 at 05:14:09PM +0100, Fuad Tabba wrote: > On Fri, 6 Sept 2024 at 17:10, Mark Brown <broonie@kernel.org> wrote: > > On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote: > > > Yes, but that's not really the issue here, unless I'm missing > > > something else. The issue is that pKVM needs to store the host's SVE > > > state too, to be able to restore it. So hiding non-symmetrically > > > supported VLs for the guests shouldn't be relevant. > > If the host kernel is also running at EL1 and it's only the hypervisor > > running at EL2 then the hypervisor can control the VLs that the host > > kernel is able to access? > Yes it can. But do we want to impose limits on host VLs when running > pKVM that might not exist without pKVM? I mean, at the minute the host kernel will just not configure any non-shared VLs so pKVM isn't making a difference anyway. Even when we add kernel mode SVE usage kernel mode FP is preemptable and schedulable so we'd not likely want to use non-shared VLs there either. If someone ever felt moved to add support for using any non-shared VLs the most likely usage would be for userspace and we'd constrain any tasks using SVE to the cores that support their VLs similar to how we handle CPUs with no 32 bit support. Hopefully the scheduler would cope well with that. > Although AFAIK, such hardware doesn't exist in practice, the reason we > went down this rabbit hole from the beginning was to ensure that we > wouldn't run into problems if it were to happen. Yes, it's not an issue with any presently known hardware - the issue is leaving nasty surprises should someone build it rather than anything immediately practical. Ideally they won't. My general feeling is that it would have been perfectly fine for pKVM to enforce what the host kernel wants to do anyway, if we ever do add support for using asymmetric VLs and care about doing so on a system running pKVM then dealing with pKVM imposed limits at that time seems more than reasonable. It probably wouldn't be the largest part of the work. Equally we now have the code so we may as well use it? It's not imposing huge overheads.
Hi, On Fri, 6 Sept 2024 at 19:03, Mark Brown <broonie@kernel.org> wrote: > > On Fri, Sep 06, 2024 at 05:14:09PM +0100, Fuad Tabba wrote: > > On Fri, 6 Sept 2024 at 17:10, Mark Brown <broonie@kernel.org> wrote: > > > On Fri, Sep 06, 2024 at 04:35:29PM +0100, Fuad Tabba wrote: > > > > > Yes, but that's not really the issue here, unless I'm missing > > > > something else. The issue is that pKVM needs to store the host's SVE > > > > state too, to be able to restore it. So hiding non-symmetrically > > > > supported VLs for the guests shouldn't be relevant. > > > > If the host kernel is also running at EL1 and it's only the hypervisor > > > running at EL2 then the hypervisor can control the VLs that the host > > > kernel is able to access? > > > Yes it can. But do we want to impose limits on host VLs when running > > pKVM that might not exist without pKVM? > > I mean, at the minute the host kernel will just not configure any > non-shared VLs so pKVM isn't making a difference anyway. Even when we > add kernel mode SVE usage kernel mode FP is preemptable and schedulable > so we'd not likely want to use non-shared VLs there either. If someone > ever felt moved to add support for using any non-shared VLs the most > likely usage would be for userspace and we'd constrain any tasks using > SVE to the cores that support their VLs similar to how we handle CPUs > with no 32 bit support. Hopefully the scheduler would cope well with > that. > > > Although AFAIK, such hardware doesn't exist in practice, the reason we > > went down this rabbit hole from the beginning was to ensure that we > > wouldn't run into problems if it were to happen. > > Yes, it's not an issue with any presently known hardware - the issue is > leaving nasty surprises should someone build it rather than anything > immediately practical. Ideally they won't. > > My general feeling is that it would have been perfectly fine for pKVM to > enforce what the host kernel wants to do anyway, if we ever do add > support for using asymmetric VLs and care about doing so on a system > running pKVM then dealing with pKVM imposed limits at that time seems > more than reasonable. It probably wouldn't be the largest part of the > work. Equally we now have the code so we may as well use it? It's not > imposing huge overheads. From pKVM, this would work and other than the potential for future support for using asymmetric VLs, I don't really see a problem. Much of the complexity was an attempt not to make pKVM more restrictive than other modes. Cheers, /fuad
On Tue, Sep 10, 2024 at 01:49:37PM +0100, Fuad Tabba wrote: > On Fri, 6 Sept 2024 at 19:03, Mark Brown <broonie@kernel.org> wrote: > > My general feeling is that it would have been perfectly fine for pKVM to > > enforce what the host kernel wants to do anyway, if we ever do add > > support for using asymmetric VLs and care about doing so on a system > > running pKVM then dealing with pKVM imposed limits at that time seems > > more than reasonable. It probably wouldn't be the largest part of the > > work. Equally we now have the code so we may as well use it? It's not > > imposing huge overheads. > From pKVM, this would work and other than the potential for future > support for using asymmetric VLs, I don't really see a problem. Much > of the complexity was an attempt not to make pKVM more restrictive > than other modes. Right, so it's just a question of if we want to use the code that doesn't impose the limit given that we have it already. I'm throwing a patch that limits the host VL into CI, should be out today.
As observed during review the pKVM support for saving host SVE state is broken if an asymmetric system has VLs larger than the maximum shared VL, fix this by discovering then using the maximum VL for allocations and using RDVL during the save/restore process. Signed-off-by: Mark Brown <broonie@kernel.org> --- Changes in v4: - Roll in Catalin's acks. - Link to v3: https://lore.kernel.org/r/20240607-kvm-arm64-fix-pkvm-sve-vl-v3-0-59e781706d65@kernel.org Changes in v3: - Replace %u with %lu in late CPU error message. Changes in v2: - Downgrade check for a late CPU increasing maximum VL to a warning only but do it unconditionally since pKVM prevents late CPUs anyway. - Commit log tweaks. - Link to v1: https://lore.kernel.org/r/20240605-kvm-arm64-fix-pkvm-sve-vl-v1-0-680d6b43b4c1@kernel.org --- Mark Brown (4): arm64/fpsimd: Introduce __bit_to_vl() helper arm64/fpsimd: Discover maximum vector length implemented by any CPU KVM: arm64: Fix FFR offset calculation for pKVM host state save and restore KVM: arm64: Avoid underallocating storage for host SVE state arch/arm64/include/asm/fpsimd.h | 17 +++++++++++++++ arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/include/asm/kvm_hyp.h | 3 ++- arch/arm64/include/asm/kvm_pkvm.h | 2 +- arch/arm64/kernel/fpsimd.c | 38 +++++++++++++++++++++++++++------ arch/arm64/kvm/hyp/fpsimd.S | 5 +++++ arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++--- arch/arm64/kvm/hyp/nvhe/pkvm.c | 2 +- arch/arm64/kvm/reset.c | 6 +++--- 10 files changed, 65 insertions(+), 18 deletions(-) --- base-commit: afb91f5f8ad7af172d993a34fde1947892408f53 change-id: 20240604-kvm-arm64-fix-pkvm-sve-vl-13cd71fd7db0 Best regards,