Message ID | 20250205132222.55816-5-shameerali.kolothum.thodi@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: arm64: Errata management for VM Live migration | expand |
On Wed, Feb 05, 2025 at 01:22:22PM +0000, Shameer Kolothum wrote: > static inline bool is_midr_in_range(struct midr_range const *range) > { > - return midr_is_cpu_model_range(read_cpuid_id(), range->model, > - range->rv_min, range->rv_max); > + int i; > + > + if (!target_impl_cpu_num) > + return midr_is_cpu_model_range(read_cpuid_id(), range->model, > + range->rv_min, range->rv_max); > + > + for (i = 0; i < target_impl_cpu_num; i++) { > + if (midr_is_cpu_model_range(target_impl_cpus[i].midr, > + range->model, > + range->rv_min, range->rv_max)) > + return true; > + } > + return false; > } It's a interesting approach but how does this work in practice if an erratum requires a firmware counterpart? Do we expect firmwares on all machines involved to have workarounds for the other machines? Or is KVM going to intercept those SMCs and pretend the EL3 counterpart is there?
On Fri, 07 Feb 2025 14:08:44 +0000, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Feb 05, 2025 at 01:22:22PM +0000, Shameer Kolothum wrote: > > static inline bool is_midr_in_range(struct midr_range const *range) > > { > > - return midr_is_cpu_model_range(read_cpuid_id(), range->model, > > - range->rv_min, range->rv_max); > > + int i; > > + > > + if (!target_impl_cpu_num) > > + return midr_is_cpu_model_range(read_cpuid_id(), range->model, > > + range->rv_min, range->rv_max); > > + > > + for (i = 0; i < target_impl_cpu_num; i++) { > > + if (midr_is_cpu_model_range(target_impl_cpus[i].midr, > > + range->model, > > + range->rv_min, range->rv_max)) > > + return true; > > + } > > + return false; > > } > > It's a interesting approach but how does this work in practice if an > erratum requires a firmware counterpart? Do we expect firmwares on all > machines involved to have workarounds for the other machines? Or is KVM > going to intercept those SMCs and pretend the EL3 counterpart is there? KVM already traps SMCs, and could do something on behalf of the guest (such as pretending that the mitigation has happened if not on the correct host) *IF* the mitigation is architected (à la WA{1,2,3}). If it is implementation specific, then we can immediately stop pretending that a guest running on those systems can be migrated. The only thing it helps a bit is big-little. Overall, this isn't meant to be foolproof at all. It is only giving people enough ammunition to make it plain that the cross-platform story is a bit of a sad joke right now. Thanks, M.
On Fri, Feb 07, 2025 at 02:31:12PM +0000, Marc Zyngier wrote: > On Fri, 07 Feb 2025 14:08:44 +0000, > Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Feb 05, 2025 at 01:22:22PM +0000, Shameer Kolothum wrote: > > > static inline bool is_midr_in_range(struct midr_range const *range) > > > { > > > - return midr_is_cpu_model_range(read_cpuid_id(), range->model, > > > - range->rv_min, range->rv_max); > > > + int i; > > > + > > > + if (!target_impl_cpu_num) > > > + return midr_is_cpu_model_range(read_cpuid_id(), range->model, > > > + range->rv_min, range->rv_max); > > > + > > > + for (i = 0; i < target_impl_cpu_num; i++) { > > > + if (midr_is_cpu_model_range(target_impl_cpus[i].midr, > > > + range->model, > > > + range->rv_min, range->rv_max)) > > > + return true; > > > + } > > > + return false; > > > } > > > > It's a interesting approach but how does this work in practice if an > > erratum requires a firmware counterpart? Do we expect firmwares on all > > machines involved to have workarounds for the other machines? Or is KVM > > going to intercept those SMCs and pretend the EL3 counterpart is there? > > KVM already traps SMCs, and could do something on behalf of the guest > (such as pretending that the mitigation has happened if not on the > correct host) *IF* the mitigation is architected (à la WA{1,2,3}). That's the main thing I had in mind. I don't think we have any other errata that requires firmware run-time discovery and interaction, though you never know when we'll add new one. > If it is implementation specific, then we can immediately stop > pretending that a guest running on those systems can be migrated. Makes sense. > The only thing it helps a bit is big-little. It does help a bit or, at least, we have some code for handling these variations that cab be extended. However, with this patchset, the host only probes the availability of the workarounds on the SoC it booted. It has no idea about the extra MIDRs the VMM picks and what the other machines in the clouds support. Anyway, let's hope the VMs only migrate between platforms that are equally broken.
On Fri, 07 Feb 2025 18:10:08 +0000, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Fri, Feb 07, 2025 at 02:31:12PM +0000, Marc Zyngier wrote: > > On Fri, 07 Feb 2025 14:08:44 +0000, > > Catalin Marinas <catalin.marinas@arm.com> wrote: > > > On Wed, Feb 05, 2025 at 01:22:22PM +0000, Shameer Kolothum wrote: > > > > static inline bool is_midr_in_range(struct midr_range const *range) > > > > { > > > > - return midr_is_cpu_model_range(read_cpuid_id(), range->model, > > > > - range->rv_min, range->rv_max); > > > > + int i; > > > > + > > > > + if (!target_impl_cpu_num) > > > > + return midr_is_cpu_model_range(read_cpuid_id(), range->model, > > > > + range->rv_min, range->rv_max); > > > > + > > > > + for (i = 0; i < target_impl_cpu_num; i++) { > > > > + if (midr_is_cpu_model_range(target_impl_cpus[i].midr, > > > > + range->model, > > > > + range->rv_min, range->rv_max)) > > > > + return true; > > > > + } > > > > + return false; > > > > } > > > > > > It's a interesting approach but how does this work in practice if an > > > erratum requires a firmware counterpart? Do we expect firmwares on all > > > machines involved to have workarounds for the other machines? Or is KVM > > > going to intercept those SMCs and pretend the EL3 counterpart is there? > > > > KVM already traps SMCs, and could do something on behalf of the guest > > (such as pretending that the mitigation has happened if not on the > > correct host) *IF* the mitigation is architected (à la WA{1,2,3}). > > That's the main thing I had in mind. I don't think we have any other > errata that requires firmware run-time discovery and interaction, though > you never know when we'll add new one. > > > If it is implementation specific, then we can immediately stop > > pretending that a guest running on those systems can be migrated. > > Makes sense. > > > The only thing it helps a bit is big-little. > > It does help a bit or, at least, we have some code for handling these > variations that cab be extended. However, with this patchset, the host > only probes the availability of the workarounds on the SoC it booted. It > has no idea about the extra MIDRs the VMM picks and what the other > machines in the clouds support. But that's the contract. The VMM has to be omniscient and know exactly what it can safely migrate to. It literally says "trust me, I know what I'm doing". > Anyway, let's hope the VMs only migrate between platforms that are > equally broken. No shortage of that, I'm afraid! :) M.
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 2a76f0e30006..975fbdfc9354 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -267,6 +267,15 @@ struct midr_range { #define MIDR_REV(m, v, r) MIDR_RANGE(m, v, r, v, r) #define MIDR_ALL_VERSIONS(m) MIDR_RANGE(m, 0, 0, 0xf, 0xf) +struct target_impl_cpu { + u64 midr; + u64 revidr; + u64 aidr; +}; + +extern u32 target_impl_cpu_num; +extern struct target_impl_cpu *target_impl_cpus; + static inline bool midr_is_cpu_model_range(u32 midr, u32 model, u32 rv_min, u32 rv_max) { @@ -278,8 +287,19 @@ static inline bool midr_is_cpu_model_range(u32 midr, u32 model, u32 rv_min, static inline bool is_midr_in_range(struct midr_range const *range) { - return midr_is_cpu_model_range(read_cpuid_id(), range->model, - range->rv_min, range->rv_max); + int i; + + if (!target_impl_cpu_num) + return midr_is_cpu_model_range(read_cpuid_id(), range->model, + range->rv_min, range->rv_max); + + for (i = 0; i < target_impl_cpu_num; i++) { + if (midr_is_cpu_model_range(target_impl_cpus[i].midr, + range->model, + range->rv_min, range->rv_max)) + return true; + } + return false; } static inline bool diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h index 409e239834d1..a12fd897c877 100644 --- a/arch/arm64/include/asm/hypervisor.h +++ b/arch/arm64/include/asm/hypervisor.h @@ -6,6 +6,7 @@ void kvm_init_hyp_services(void); bool kvm_arm_hyp_service_available(u32 func_id); +void kvm_arm_target_impl_cpu_init(void); #ifdef CONFIG_ARM_PKVM_GUEST void pkvm_init_hyp_services(void); diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 99b55893fc4e..1177a3094cf2 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -14,6 +14,9 @@ #include <asm/kvm_asm.h> #include <asm/smp_plat.h> +u32 target_impl_cpu_num; +struct target_impl_cpu *target_impl_cpus; + static bool __maybe_unused __is_affected_midr_range(const struct arm64_cpu_capabilities *entry, u32 midr, u32 revidr) @@ -32,9 +35,20 @@ __is_affected_midr_range(const struct arm64_cpu_capabilities *entry, static bool __maybe_unused is_affected_midr_range(const struct arm64_cpu_capabilities *entry, int scope) { - WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); - return __is_affected_midr_range(entry, read_cpuid_id(), - read_cpuid(REVIDR_EL1)); + int i; + + if (!target_impl_cpu_num) { + WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); + return __is_affected_midr_range(entry, read_cpuid_id(), + read_cpuid(REVIDR_EL1)); + } + + for (i = 0; i < target_impl_cpu_num; i++) { + if (__is_affected_midr_range(entry, target_impl_cpus[i].midr, + target_impl_cpus[i].midr)) + return true; + } + return false; } static bool __maybe_unused diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 72e876f37cd4..5c61d9d9f097 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -86,6 +86,7 @@ #include <asm/kvm_host.h> #include <asm/mmu_context.h> #include <asm/mte.h> +#include <asm/hypervisor.h> #include <asm/processor.h> #include <asm/smp.h> #include <asm/sysreg.h> @@ -3679,6 +3680,7 @@ unsigned long cpu_get_elf_hwcap3(void) static void __init setup_boot_cpu_capabilities(void) { + kvm_arm_target_impl_cpu_init(); /* * The boot CPU's feature register values have been recorded. Detect * boot cpucaps and local cpucaps for the boot CPU, then enable and diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h index ef3a69cc398e..a6926750faaf 100644 --- a/arch/arm64/kernel/image-vars.h +++ b/arch/arm64/kernel/image-vars.h @@ -49,6 +49,8 @@ PROVIDE(__pi_arm64_sw_feature_override = arm64_sw_feature_override); PROVIDE(__pi_arm64_use_ng_mappings = arm64_use_ng_mappings); #ifdef CONFIG_CAVIUM_ERRATUM_27456 PROVIDE(__pi_cavium_erratum_27456_cpus = cavium_erratum_27456_cpus); +PROVIDE(__pi_target_impl_cpu_num = target_impl_cpu_num); +PROVIDE(__pi_target_impl_cpus = target_impl_cpus); #endif PROVIDE(__pi__ctype = _ctype); PROVIDE(__pi_memstart_offset_seed = memstart_offset_seed); diff --git a/drivers/firmware/smccc/kvm_guest.c b/drivers/firmware/smccc/kvm_guest.c index f3319be20b36..f0b0154150b5 100644 --- a/drivers/firmware/smccc/kvm_guest.c +++ b/drivers/firmware/smccc/kvm_guest.c @@ -6,8 +6,11 @@ #include <linux/bitmap.h> #include <linux/cache.h> #include <linux/kernel.h> +#include <linux/memblock.h> #include <linux/string.h> +#include <uapi/linux/psci.h> + #include <asm/hypervisor.h> static DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) __ro_after_init = { }; @@ -51,3 +54,63 @@ bool kvm_arm_hyp_service_available(u32 func_id) return test_bit(func_id, __kvm_arm_hyp_services); } EXPORT_SYMBOL_GPL(kvm_arm_hyp_service_available); + +void __init kvm_arm_target_impl_cpu_init(void) +{ + int i; + u32 ver; + u64 max_cpus; + struct arm_smccc_res res; + + /* Check we have already set targets */ + if (target_impl_cpu_num) + return; + + if (!kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_DISCOVER_IMPL_VER) || + !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_DISCOVER_IMPL_CPUS)) + return; + + arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_VER_FUNC_ID, + 0, &res); + if (res.a0 != SMCCC_RET_SUCCESS) + return; + + /* Version info is in lower 32 bits and is in SMMCCC_VERSION format */ + ver = lower_32_bits(res.a1); + if (PSCI_VERSION_MAJOR(ver) != 1) { + pr_warn("Unsupported target CPU implementation version v%d.%d\n", + PSCI_VERSION_MAJOR(ver), PSCI_VERSION_MINOR(ver)); + return; + } + + if (!res.a2) { + pr_warn("No target implementation CPUs specified\n"); + return; + } + + max_cpus = res.a2; + target_impl_cpus = memblock_alloc(sizeof(*target_impl_cpus) * max_cpus, + __alignof__(*target_impl_cpus)); + if (!target_impl_cpus) { + pr_warn("Not enough memory for struct target_impl_cpu\n"); + return; + } + + for (i = 0; i < max_cpus; i++) { + arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_CPUS_FUNC_ID, + i, &res); + if (res.a0 != SMCCC_RET_SUCCESS) { + memblock_free(target_impl_cpus, + sizeof(*target_impl_cpus) * max_cpus); + target_impl_cpus = NULL; + pr_warn("Discovering target implementation CPUs failed\n"); + return; + } + target_impl_cpus[i].midr = res.a1; + target_impl_cpus[i].revidr = res.a2; + target_impl_cpus[i].aidr = res.a3; + }; + + target_impl_cpu_num = max_cpus; + pr_info("Number of target implementation CPUs is %d\n", target_impl_cpu_num); +}