mbox series

[v5,0/4] KVM: arm64: Errata management for VM Live migration

Message ID 20250124151732.6072-1-shameerali.kolothum.thodi@huawei.com (mailing list archive)
Headers show
Series KVM: arm64: Errata management for VM Live migration | expand

Message

Shameer Kolothum Jan. 24, 2025, 3:17 p.m. UTC
Hi,

v4 --> v5
https://lore.kernel.org/kvmarm/20241218105345.73472-1-shameerali.kolothum.thodi@huawei.com/

-Addressed comments from Marc,
 -Added an hypercall to retrieve version and number of supported target
  impl CPUs.
 -Added a check for KVM hypercall services availability.
-Removed R-by tags from Connie & Sebastian as patches 2 & 4 changed a bit.
 Please take another look.

This can be sanity tested by the Qemu branch here,
https://github.com/hisilicon/qemu/tree/v9.0-nv-rfcv4-vcpu-model-v2-target-cpu-errata
(branch based on Eric's/Connie's NV + custom CPU series)
Eg: to specify target impl CPUs,
-machine virt,.., x-target-impl-cpus=0xMIDR1:0xREVIDR1-0xMIDR2:REVIDR2

Please take a look and let me know your feedback.

Thanks,
Shameer

v3 --> v4(Minor updates)
https://lore.kernel.org/kvmarm/20241209115311.40496-1-shameerali.kolothum.thodi@huawei.com/

 -Changed MIDR/REVIDR to 64 bits based on feedback from Connie
  and Marc(Patch #3).
 -Added R-by tags from Sebastian (Thanks!).

RFC v2 --> v3
https://lore.kernel.org/kvmarm/20241024094012.29452-1-shameerali.kolothum.thodi@huawei.com/

 -Addressed comments from Oliver(Thanks!).
 -Using implementation CPUs MIDR/REVIDR when it is set for
  _midr_range() functions(Patch 1 & 3)
 -New hypercall for retrieving implementation CPUs(Patch #2).
 -Dropped RFC.

RFC v1 --> RFCv2:
https://lore.kernel.org/kvmarm/20241011075053.80540-1-shameerali.kolothum.thodi@huawei.com/
 -Introduced hypercalls to retrieve target CPUs info from user space VMM.
  see patch #1 for details.
 -Patch #2 uses the hypercall to retrieve the target CPU info if any.
 -Use the target CPUs MIDR/REVIDR in errata enablement. See patch #3.

Background from v1:

On ARM64 platforms most of the errata workarounds are based on CPU
MIDR/REVIDR values and a number of these workarounds need to be
implemented by the Guest kernel as well. This creates a problem when
Guest needs to be migrated to a platform that differs in these
MIDR/REVIDR values even if the VMM can come up with a common minimum
feature list for the Guest using the recently introduced "Writable
ID registers" support.

(This is roughly based on a discussion I had with Marc and Oliver
at KVM forum. Marc outlined his idea for a solution and this is an
attempt to implement it. Thanks to both and I take all the blame
if this is nowhere near what is intended/required)

Shameer Kolothum (4):
  arm64: Modify _midr_range() functions to read MIDR/REVIDR internally
  KVM: arm64: Introduce hypercall support for retrieving target
    implementations
  KVM: arm64: Report all the KVM/arm64-specific hypercalls
  arm64: paravirt: Enable errata based on implementation CPUs

 Documentation/virt/kvm/arm/hypercalls.rst | 59 +++++++++++++++++++++++
 arch/arm64/include/asm/cputype.h          | 50 +++++++++++++------
 arch/arm64/include/asm/mmu.h              |  3 +-
 arch/arm64/include/asm/paravirt.h         |  3 ++
 arch/arm64/kernel/cpu_errata.c            | 37 ++++++++++----
 arch/arm64/kernel/cpufeature.c            |  8 +--
 arch/arm64/kernel/image-vars.h            |  2 +
 arch/arm64/kernel/paravirt.c              | 58 ++++++++++++++++++++++
 arch/arm64/kernel/proton-pack.c           | 17 +++----
 arch/arm64/kvm/hypercalls.c               |  6 ++-
 arch/arm64/kvm/vgic/vgic-v3.c             |  2 +-
 drivers/clocksource/arm_arch_timer.c      |  2 +-
 include/linux/arm-smccc.h                 | 17 +++++++
 13 files changed, 222 insertions(+), 42 deletions(-)

Comments

Sebastian Ott Feb. 4, 2025, 4:45 p.m. UTC | #1
Hey,

On Fri, 24 Jan 2025, Shameer Kolothum wrote:
> On ARM64 platforms most of the errata workarounds are based on CPU
> MIDR/REVIDR values and a number of these workarounds need to be
> implemented by the Guest kernel as well. This creates a problem when
> Guest needs to be migrated to a platform that differs in these
> MIDR/REVIDR values even if the VMM can come up with a common minimum
> feature list for the Guest using the recently introduced "Writable
> ID registers" support.

Currently MIDR/REVIDR are still RO and guest access is not trapped - so
even with the errata management patches in place the guest state would
change and a migration (between hosts that differ in these regs) would
not be possible. Are there any plans to allow to actually change these?

Thanks,
Sebastian
Marc Zyngier Feb. 4, 2025, 5:11 p.m. UTC | #2
On Tue, 04 Feb 2025 16:45:38 +0000,
Sebastian Ott <sebott@redhat.com> wrote:
> 
> Hey,
> 
> On Fri, 24 Jan 2025, Shameer Kolothum wrote:
> > On ARM64 platforms most of the errata workarounds are based on CPU
> > MIDR/REVIDR values and a number of these workarounds need to be
> > implemented by the Guest kernel as well. This creates a problem when
> > Guest needs to be migrated to a platform that differs in these
> > MIDR/REVIDR values even if the VMM can come up with a common minimum
> > feature list for the Guest using the recently introduced "Writable
> > ID registers" support.
> 
> Currently MIDR/REVIDR are still RO and guest access is not trapped - so
> even with the errata management patches in place the guest state would
> change and a migration (between hosts that differ in these regs) would
> not be possible. Are there any plans to allow to actually change these?

Sure thing. We only need a victim! :)

	M.
Sebastian Ott Feb. 4, 2025, 5:42 p.m. UTC | #3
On Tue, 4 Feb 2025, Marc Zyngier wrote:
> On Tue, 04 Feb 2025 16:45:38 +0000,
> Sebastian Ott <sebott@redhat.com> wrote:
>>
>> Hey,
>>
>> On Fri, 24 Jan 2025, Shameer Kolothum wrote:
>>> On ARM64 platforms most of the errata workarounds are based on CPU
>>> MIDR/REVIDR values and a number of these workarounds need to be
>>> implemented by the Guest kernel as well. This creates a problem when
>>> Guest needs to be migrated to a platform that differs in these
>>> MIDR/REVIDR values even if the VMM can come up with a common minimum
>>> feature list for the Guest using the recently introduced "Writable
>>> ID registers" support.
>>
>> Currently MIDR/REVIDR are still RO and guest access is not trapped - so
>> even with the errata management patches in place the guest state would
>> change and a migration (between hosts that differ in these regs) would
>> not be possible. Are there any plans to allow to actually change these?
>
> Sure thing. We only need a victim! :)

;-) Nice. I'll hack smth up then.

Sebastian
Marc Zyngier Feb. 4, 2025, 6:15 p.m. UTC | #4
On Tue, 04 Feb 2025 17:42:04 +0000,
Sebastian Ott <sebott@redhat.com> wrote:
> 
> On Tue, 4 Feb 2025, Marc Zyngier wrote:
> > On Tue, 04 Feb 2025 16:45:38 +0000,
> > Sebastian Ott <sebott@redhat.com> wrote:
> >> 
> >> Hey,
> >> 
> >> On Fri, 24 Jan 2025, Shameer Kolothum wrote:
> >>> On ARM64 platforms most of the errata workarounds are based on CPU
> >>> MIDR/REVIDR values and a number of these workarounds need to be
> >>> implemented by the Guest kernel as well. This creates a problem when
> >>> Guest needs to be migrated to a platform that differs in these
> >>> MIDR/REVIDR values even if the VMM can come up with a common minimum
> >>> feature list for the Guest using the recently introduced "Writable
> >>> ID registers" support.
> >> 
> >> Currently MIDR/REVIDR are still RO and guest access is not trapped - so
> >> even with the errata management patches in place the guest state would
> >> change and a migration (between hosts that differ in these regs) would
> >> not be possible. Are there any plans to allow to actually change these?
> > 
> > Sure thing. We only need a victim! :)
> 
> ;-) Nice. I'll hack smth up then.

Great.

Ideally, you would get rid of all the remaining invariant registers
(MIDR, REVIDR and AIDR). But you must preserve the current behaviour
as the default, sampling these registers on the CPU that initialises
KVM, and preserve the values for userspace to observe unless they are
written to (yes, that's broken, but we're stuck with that).

Also, please don't trap MIDR_EL1. That's very pointless. You only need
to trap REVIDR and AIDR via HCR_EL2.TID1.

Thanks,

	M.