Message ID | 20211221090449.15337-4-kechenl@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: add per-vCPU exits disable capability | expand |
On Tue, Dec 21, 2021, Kechen Lu wrote: > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d5d0d99b584e..d7b4a3e360bb 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5072,6 +5072,18 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, > kvm_update_pv_runtime(vcpu); > > return 0; > + > + case KVM_CAP_X86_DISABLE_EXITS: > + if (cap->args[0] && (cap->args[0] & > + ~KVM_X86_DISABLE_VALID_EXITS)) Bad alignment, but there's no need for the !0 in the first place, i.e. if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS) but that's also incomplete as this path only supports toggling HLT, yet allows all flavors of KVM_X86_DISABLE_VALID_EXITS. Unless there's a good reason to not allow maniuplating the other exits, the proper fix is to just support everything. > + return -EINVAL; > + > + vcpu->arch.hlt_in_guest = (cap->args[0] & > + KVM_X86_DISABLE_EXITS_HLT) ? true : false; Hmm, this behavior diverges from the per-VM ioctl, which doesn't allow re-enabling a disabled exit. We can't change the per-VM behavior without breaking backwards compatibility, e.g. if userspace does: if (allow_mwait) kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_MWAIT) if (allow_hlt) kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_HLT) then changing KVM behavior would result in MWAIT behavior intercepted when previously it would have been allowed. We have a userspace VMM that operates like this... Does your use case require toggling intercepts? Or is the configuration static? If it's static, then the easiest thing would be to follow the per-VM behavior so that there are no suprises. If toggling is required, then I think the best thing would be to add a prep patch to add an override flag to the per-VM ioctl, and then share code between the per-VM and per-vCPU paths for modifying the flags (attached as patch 0003). Somewhat related, there's another bug of sorts that I think we can safely fix. KVM doesn't reject disabling of MWAIT exits when MWAIT isn't allowed in the guest, and instead ignores the bad input. Not a big deal, but fixing that means KVM doesn't need to check kvm_can_mwait_in_guest() when processing the args to update flags. If that breaks someone's userspace, the alternative would be to tweak the attached patch 0003 to introduce the OVERRIDE, e.g. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f611a49ceba4..3bac756bab79 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5053,6 +5053,8 @@ static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu, #define kvm_ioctl_disable_exits(a, mask) \ ({ \ + if (!kvm_can_mwait_in_guest()) \ + (mask) &= KVM_X86_DISABLE_EXITS_MWAIT; \ if ((mask) & KVM_X86_DISABLE_EXITS_OVERRIDE) { \ (a).mwait_in_guest = (mask) & KVM_X86_DISABLE_EXITS_MWAIT; \ (a).hlt_in_guest = (mask) & KVM_X86_DISABLE_EXITS_HLT; \ If toggling is not required, then I still think it makes sense to add a macro to handle propagating the capability args to the arch flags.
On Mon, Jan 10, 2022, Sean Christopherson wrote: > Does your use case require toggling intercepts? Or is the configuration static? > If it's static, then the easiest thing would be to follow the per-VM behavior so > that there are no suprises. If toggling is required, then I think the best thing > would be to add a prep patch to add an override flag to the per-VM ioctl, and then > share code between the per-VM and per-vCPU paths for modifying the flags (attached > as patch 0003). ... > If toggling is not required, then I still think it makes sense to add a macro to > handle propagating the capability args to the arch flags. Almost forgot. Can you please add a selftests to verify whatever per-VM and per-vCPU behavior we end implementing? Thanks!
> -----Original Message----- > From: Sean Christopherson <seanjc@google.com> > Sent: Monday, January 10, 2022 12:56 PM > To: Kechen Lu <kechenl@nvidia.com> > Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wanpengli@tencent.com; > vkuznets@redhat.com; mst@redhat.com; Somdutta Roy > <somduttar@nvidia.com>; linux-kernel@vger.kernel.org > Subject: Re: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits > disable capability > > External email: Use caution opening links or attachments > > > On Tue, Dec 21, 2021, Kechen Lu wrote: > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index > > d5d0d99b584e..d7b4a3e360bb 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -5072,6 +5072,18 @@ static int kvm_vcpu_ioctl_enable_cap(struct > kvm_vcpu *vcpu, > > kvm_update_pv_runtime(vcpu); > > > > return 0; > > + > > + case KVM_CAP_X86_DISABLE_EXITS: > > + if (cap->args[0] && (cap->args[0] & > > + ~KVM_X86_DISABLE_VALID_EXITS)) > > Bad alignment, but there's no need for the !0 in the first place, i.e. > > if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS) > Ack. > but that's also incomplete as this path only supports toggling HLT, yet allows > all flavors of KVM_X86_DISABLE_VALID_EXITS. Unless there's a good reason > to not allow maniuplating the other exits, the proper fix is to just support > everything. > Makes sense. When I implement this patch version, only thinking about the use case of HLT intercept and want to see more comments if this framework looks good. Will complete this to support other exits. > > + return -EINVAL; > > + > > + vcpu->arch.hlt_in_guest = (cap->args[0] & > > + KVM_X86_DISABLE_EXITS_HLT) ? true : false; > > Hmm, this behavior diverges from the per-VM ioctl, which doesn't allow re- > enabling a disabled exit. We can't change the per-VM behavior without > breaking backwards compatibility, e.g. if userspace does: > > if (allow_mwait) > kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_MWAIT) > if (allow_hlt) > kvm_vm_disable_exit(KVM_X86_DISABLE_EXITS_HLT) > > then changing KVM behavior would result in MWAIT behavior intercepted > when previously it would have been allowed. We have a userspace VMM > that operates like this... > > Does your use case require toggling intercepts? Or is the configuration > static? > If it's static, then the easiest thing would be to follow the per-VM behavior so > that there are no suprises. If toggling is required, then I think the best thing > would be to add a prep patch to add an override flag to the per-VM ioctl, and > then share code between the per-VM and per-vCPU paths for modifying the > flags (attached as patch 0003). > Our use case for now is static configuration. But since the per-vcpu ioctl is anyhow executed runtime after the vcpu creation, so it is the "toggling" and needs overrides on some vcpus. "OVERRIDE" flag makes much sense to me, the macro looks clean and neat for reducing redundant codes. Thanks a lot for the patch. > Somewhat related, there's another bug of sorts that I think we can safely fix. > KVM doesn't reject disabling of MWAIT exits when MWAIT isn't allowed in > the guest, and instead ignores the bad input. Not a big deal, but fixing that > means KVM doesn't need to check kvm_can_mwait_in_guest() when > processing the args to update flags. If that breaks someone's userspace, the > alternative would be to tweak the attached patch 0003 to introduce the > OVERRIDE, e.g. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index > f611a49ceba4..3bac756bab79 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5053,6 +5053,8 @@ static int kvm_vcpu_ioctl_device_attr(struct > kvm_vcpu *vcpu, > > #define kvm_ioctl_disable_exits(a, mask) \ > ({ \ > + if (!kvm_can_mwait_in_guest()) \ > + (mask) &= KVM_X86_DISABLE_EXITS_MWAIT; \ For some userspace's backward compatibility, adding this tweak to the attached Patch 0003 makes sense. BTW, (mask) &= KVM_X86_DISABLE_EXITS_MWAIT seems should be (mask) &= ~KVM_X86_DISABLE_EXITS_MWAIT, I guess it's a typo :P. Will include the attached patch 0001 in the next as well. Thanks for all the help! Best Regards, Kechen > if ((mask) & KVM_X86_DISABLE_EXITS_OVERRIDE) { \ > (a).mwait_in_guest = (mask) & KVM_X86_DISABLE_EXITS_MWAIT; > \ > (a).hlt_in_guest = (mask) & KVM_X86_DISABLE_EXITS_HLT; \ > > > If toggling is not required, then I still think it makes sense to add a macro to > handle propagating the capability args to the arch flags.
> -----Original Message----- > From: Sean Christopherson <seanjc@google.com> > Sent: Monday, January 10, 2022 1:01 PM > To: Kechen Lu <kechenl@nvidia.com> > Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wanpengli@tencent.com; > vkuznets@redhat.com; mst@redhat.com; Somdutta Roy > <somduttar@nvidia.com>; linux-kernel@vger.kernel.org > Subject: Re: [RFC PATCH v2 3/3] KVM: x86: add vCPU ioctl for HLT exits > disable capability > > External email: Use caution opening links or attachments > > > On Mon, Jan 10, 2022, Sean Christopherson wrote: > > Does your use case require toggling intercepts? Or is the configuration > static? > > If it's static, then the easiest thing would be to follow the per-VM > > behavior so that there are no suprises. If toggling is required, then > > I think the best thing would be to add a prep patch to add an override > > flag to the per-VM ioctl, and then share code between the per-VM and > > per-vCPU paths for modifying the flags (attached as patch 0003). > > ... > > > If toggling is not required, then I still think it makes sense to add > > a macro to handle propagating the capability args to the arch flags. > > Almost forgot. Can you please add a selftests to verify whatever per-VM and > per-vCPU behavior we end implementing? Thanks! Sure, will add selftests for per-VM and per-vCPU disable exits cap. Thanks, Kechen
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index d1c50b95bbc1..b340e36a34f3 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6581,7 +6581,7 @@ branch to guests' 0x200 interrupt vector. :Architectures: x86 :Parameters: args[0] defines which exits are disabled :Returns: 0 on success, -EINVAL when args[0] contains invalid exits - or if any vCPU has already been created + or if any vCPU has already been created for vm ioctl Valid bits in args[0] are:: @@ -6595,7 +6595,8 @@ longer intercept some instructions for improved latency in some workloads, and is suggested when vCPUs are associated to dedicated physical CPUs. More bits can be added in the future; userspace can just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable -all such vmexits. +all such vmexits. vCPUs scoped capability support is also added for +HLT exits. Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits. diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index cefe1d81e2e8..3e54400535f2 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -121,6 +121,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush) KVM_X86_OP_NULL(migrate_timers) KVM_X86_OP(msr_filter_changed) KVM_X86_OP_NULL(complete_emulated_msr) +KVM_X86_OP(update_disabled_exits) #undef KVM_X86_OP #undef KVM_X86_OP_NULL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index edc5fca4d8c8..20a1f34c772c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1498,6 +1498,8 @@ struct kvm_x86_ops { int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); + + void (*update_disabled_exits)(struct kvm_vcpu *vcpu); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 70e393c6dfb5..6cad069a540a 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4556,6 +4556,14 @@ static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector) sev_vcpu_deliver_sipi_vector(vcpu, vector); } +static void svm_update_disabled_exits(struct kvm_vcpu *vcpu) +{ + if (kvm_hlt_in_guest(vcpu)) + svm_clr_intercept(to_svm(vcpu), INTERCEPT_HLT); + else + svm_set_intercept(to_svm(vcpu), INTERCEPT_HLT); +} + static void svm_vm_destroy(struct kvm *kvm) { avic_vm_destroy(kvm); @@ -4705,6 +4713,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .complete_emulated_msr = svm_complete_emulated_msr, .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, + + .update_disabled_exits = svm_update_disabled_exits, }; static struct kvm_x86_init_ops svm_init_ops __initdata = { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b5133619dea1..149eb621b124 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7536,6 +7536,14 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit) return supported & BIT(bit); } +static void vmx_update_disabled_exits(struct kvm_vcpu *vcpu) +{ + if (kvm_hlt_in_guest(vcpu)) + exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_HLT_EXITING); + else + exec_controls_setbit(to_vmx(vcpu), CPU_BASED_HLT_EXITING); +} + static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = "kvm_intel", @@ -7672,6 +7680,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + + .update_disabled_exits = vmx_update_disabled_exits, }; static __init void vmx_setup_user_return_msrs(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d5d0d99b584e..d7b4a3e360bb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5072,6 +5072,18 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, kvm_update_pv_runtime(vcpu); return 0; + + case KVM_CAP_X86_DISABLE_EXITS: + if (cap->args[0] && (cap->args[0] & + ~KVM_X86_DISABLE_VALID_EXITS)) + return -EINVAL; + + vcpu->arch.hlt_in_guest = (cap->args[0] & + KVM_X86_DISABLE_EXITS_HLT) ? true : false; + + static_call(kvm_x86_update_disabled_exits)(vcpu); + return 0; + default: return -EINVAL; }
Introduce support of vCPU-scoped ioctl with KVM_CAP_X86_DISABLE_EXITS cap for disabling exits to enable finer-grained VM exits disabling on per vCPU scales instead of whole guest. This patch enabled the vCPU-scoped exits control on HLT VM-exits. In use cases like Windows guest running heavy CPU-bound workloads, disabling HLT VM-exits could mitigate host sched ctx switch overhead. Simply HLT disabling on all vCPUs could bring performance benefits, but if no pCPUs reserved for host threads, could happened to the forced preemption as host does not know the time to do the schedule for other host threads want to run. With this patch, we could only disable part of vCPUs HLT exits for one guest, this still keeps performance benefits, and also shows resiliency to host stressing workload running at the same time. In the host stressing workload experiment with Windows guest heavy CPU-bound workloads, it shows good resiliency and having the ~3% performance improvement. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kechen Lu <kechenl@nvidia.com> --- Documentation/virt/kvm/api.rst | 5 +++-- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm/svm.c | 10 ++++++++++ arch/x86/kvm/vmx/vmx.c | 10 ++++++++++ arch/x86/kvm/x86.c | 12 ++++++++++++ 6 files changed, 38 insertions(+), 2 deletions(-)