Message ID | 20240517173926.965351-3-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: CPUID overhaul, fixes, and caching | expand |
On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote: > Explicitly perform runtime CPUID adjustments as part of the "after set > CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID > state during kvm_update_cpuid_runtime(). E.g. see commit 4736d85f0d18 > ("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT"). > > Whacking each mole individually is not sustainable or robust, e.g. while > the aforemention commit fixed KVM's PV features, the same issue lurks for > Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime > features (though spoiler alert, neither should KVM). > > Updating runtime features in the "full" path will also simplify adding a > snapshot of the guest's capabilities, i.e. of caching the intersection of > guest CPUID and kvm_cpu_caps (modulo a few edge cases). > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/cpuid.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 2b19ff991ceb..e60ffb421e4b 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -345,6 +345,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > bitmap_zero(vcpu->arch.governed_features.enabled, > KVM_MAX_NR_GOVERNED_FEATURES); > > + kvm_update_cpuid_runtime(vcpu); > + > /* > * If TDP is enabled, let the guest use GBPAGES if they're supported in > * hardware. The hardware page walker doesn't let KVM disable GBPAGES, > @@ -426,8 +428,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, > { > int r; > > - __kvm_update_cpuid_runtime(vcpu, e2, nent); > - > /* > * KVM does not correctly handle changing guest CPUID after KVM_RUN, as > * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't > @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, > * whether the supplied CPUID data is equal to what's already set. > */ > if (kvm_vcpu_has_run(vcpu)) { > + /* > + * Note, runtime CPUID updates may consume other CPUID-driven > + * vCPU state, e.g. KVM or Xen CPUID bases. Updating runtime > + * state before full CPUID processing is functionally correct > + * only because any change in CPUID is disallowed, i.e. using > + * stale data is ok because KVM will reject the change. > + */ If I understand correctly the sole reason for the below __kvm_update_cpuid_runtime is to ensure that kvm_cpuid_check_equal doesn't fail because current cpuid also was post-processed with runtime updates. Can we have a comment stating this? Or even better how about moving the call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal, to emphasize this? > + __kvm_update_cpuid_runtime(vcpu, e2, nent); > + > r = kvm_cpuid_check_equal(vcpu, e2, nent); > if (r) > return r; Overall I am not 100% sure what is better: Before the patch it was roughly like this: 1. Post process the user given cpuid with bits of KVM runtime state (like xcr0) At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID. 2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid) 3. kvm_check_cpuid on the user provided cpuid 4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid 5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid 6. kvm_vcpu_after_set_cpuid itself. After this change it works like that: 1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid) 2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs 3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing 4. kvm_get_hypervisor_cpuid 5. kvm_update_cpuid_runtime 6. The old kvm_vcpu_after_set_cpuid I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and kvm_get_hypervisor_cpuid into kvm_vcpu_after_set_cpuid would clean up this mess a bit regardless of this patch. Best regards, Maxim Levitsky
On Thu, Jul 04, 2024, Maxim Levitsky wrote: > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote: > > /* > > * KVM does not correctly handle changing guest CPUID after KVM_RUN, as > > * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't > > @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, > > * whether the supplied CPUID data is equal to what's already set. > > */ > > if (kvm_vcpu_has_run(vcpu)) { > > + /* > > + * Note, runtime CPUID updates may consume other CPUID-driven > > + * vCPU state, e.g. KVM or Xen CPUID bases. Updating runtime > > + * state before full CPUID processing is functionally correct > > + * only because any change in CPUID is disallowed, i.e. using > > + * stale data is ok because KVM will reject the change. > > + */ > > If I understand correctly the sole reason for the below > __kvm_update_cpuid_runtime is to ensure that kvm_cpuid_check_equal doesn't > fail because current cpuid also was post-processed with runtime updates. Yep. > Can we have a comment stating this? Or even better how about moving the > call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal, > to emphasize this? Ya, I'll do both. > > + __kvm_update_cpuid_runtime(vcpu, e2, nent); > > + > > r = kvm_cpuid_check_equal(vcpu, e2, nent); > > if (r) > > return r; > > > > Overall I am not 100% sure what is better: > > Before the patch it was roughly like this: > > 1. Post process the user given cpuid with bits of KVM runtime state (like xcr0) > At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID. > > 2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid) > > 3. kvm_check_cpuid on the user provided cpuid > > 4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid > > 5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid > > 6. kvm_vcpu_after_set_cpuid itself. > > > After this change it works like that: > > 1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid) > 2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs > 3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing > 4. kvm_get_hypervisor_cpuid > 5. kvm_update_cpuid_runtime > 6. The old kvm_vcpu_after_set_cpuid > > I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and > kvm_get_hypervisor_cpuid into kvm_vcpu_after_set_cpuid would clean up this > mess a bit regardless of this patch. It takes many more patches, but doing the swap() allows for the removal of several APIs that poke into a "raw" kvm_cpuid_entry2 array, and by the end of the series (with your above feedback addressed) the code gets to (sans comments): swap(vcpu->arch.cpuid_entries, e2); swap(vcpu->arch.cpuid_nent, nent); memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps)); BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps)); if (kvm_vcpu_has_run(vcpu)) { r = kvm_cpuid_check_equal(vcpu, e2, nent); if (r) goto err; goto success; } Those are really just bonuses though, the main goal is to prevent recurrences of bugs where KVM consumes stale vCPU state[*], which is what prompted this change. [*] https://lore.kernel.org/all/20240228101837.93642-1-vkuznets@redhat.com
On Tue, 2024-07-09 at 12:46 -0700, Sean Christopherson wrote: > On Thu, Jul 04, 2024, Maxim Levitsky wrote: > > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote: > > > /* > > > * KVM does not correctly handle changing guest CPUID after KVM_RUN, as > > > * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't > > > @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, > > > * whether the supplied CPUID data is equal to what's already set. > > > */ > > > if (kvm_vcpu_has_run(vcpu)) { > > > + /* > > > + * Note, runtime CPUID updates may consume other CPUID-driven > > > + * vCPU state, e.g. KVM or Xen CPUID bases. Updating runtime > > > + * state before full CPUID processing is functionally correct > > > + * only because any change in CPUID is disallowed, i.e. using > > > + * stale data is ok because KVM will reject the change. > > > + */ > > > > If I understand correctly the sole reason for the below > > __kvm_update_cpuid_runtime is to ensure that kvm_cpuid_check_equal doesn't > > fail because current cpuid also was post-processed with runtime updates. > > Yep. > > > Can we have a comment stating this? Or even better how about moving the > > call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal, > > to emphasize this? > > Ya, I'll do both. > > > > + __kvm_update_cpuid_runtime(vcpu, e2, nent); > > > + > > > r = kvm_cpuid_check_equal(vcpu, e2, nent); > > > if (r) > > > return r; > > > > > > Overall I am not 100% sure what is better: > > > > Before the patch it was roughly like this: > > > > 1. Post process the user given cpuid with bits of KVM runtime state (like xcr0) > > At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID. > > > > 2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid) > > > > 3. kvm_check_cpuid on the user provided cpuid > > > > 4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid > > > > 5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid > > > > 6. kvm_vcpu_after_set_cpuid itself. > > > > > > After this change it works like that: > > > > 1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid) > > 2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs > > 3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing > > 4. kvm_get_hypervisor_cpuid > > 5. kvm_update_cpuid_runtime > > 6. The old kvm_vcpu_after_set_cpuid > > > > I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and > > kvm_get_hypervisor_cpuid into kvm_vcpu_after_set_cpuid would clean up this > > mess a bit regardless of this patch. > > It takes many more patches, but doing the swap() allows for the removal of several > APIs that poke into a "raw" kvm_cpuid_entry2 array, and by the end of the series > (with your above feedback addressed) the code gets to (sans comments): > > swap(vcpu->arch.cpuid_entries, e2); > swap(vcpu->arch.cpuid_nent, nent); > > memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps)); > BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps)); > > if (kvm_vcpu_has_run(vcpu)) { > r = kvm_cpuid_check_equal(vcpu, e2, nent); > if (r) > goto err; > goto success; > } > > Those are really just bonuses though, the main goal is to prevent recurrences of > bugs where KVM consumes stale vCPU state[*], which is what prompted this change. > > [*] https://lore.kernel.org/all/20240228101837.93642-1-vkuznets@redhat.com > All makes sense, thanks! Best regards, Maxim Levitsky
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 2b19ff991ceb..e60ffb421e4b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -345,6 +345,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) bitmap_zero(vcpu->arch.governed_features.enabled, KVM_MAX_NR_GOVERNED_FEATURES); + kvm_update_cpuid_runtime(vcpu); + /* * If TDP is enabled, let the guest use GBPAGES if they're supported in * hardware. The hardware page walker doesn't let KVM disable GBPAGES, @@ -426,8 +428,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, { int r; - __kvm_update_cpuid_runtime(vcpu, e2, nent); - /* * KVM does not correctly handle changing guest CPUID after KVM_RUN, as * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, * whether the supplied CPUID data is equal to what's already set. */ if (kvm_vcpu_has_run(vcpu)) { + /* + * Note, runtime CPUID updates may consume other CPUID-driven + * vCPU state, e.g. KVM or Xen CPUID bases. Updating runtime + * state before full CPUID processing is functionally correct + * only because any change in CPUID is disallowed, i.e. using + * stale data is ok because KVM will reject the change. + */ + __kvm_update_cpuid_runtime(vcpu, e2, nent); + r = kvm_cpuid_check_equal(vcpu, e2, nent); if (r) return r;
Explicitly perform runtime CPUID adjustments as part of the "after set CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID state during kvm_update_cpuid_runtime(). E.g. see commit 4736d85f0d18 ("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT"). Whacking each mole individually is not sustainable or robust, e.g. while the aforemention commit fixed KVM's PV features, the same issue lurks for Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime features (though spoiler alert, neither should KVM). Updating runtime features in the "full" path will also simplify adding a snapshot of the guest's capabilities, i.e. of caching the intersection of guest CPUID and kvm_cpu_caps (modulo a few edge cases). Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/cpuid.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)